<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1741-7007-4-38</ui>
   <ji>1741-7007</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Multiple, non-allelic, intein-coding sequences in eukaryotic RNA polymerase genes</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Goodwin</snm>
               <mi>JD</mi>
               <fnm>Timothy</fnm>
               <insr iid="I1"/>
               <email>timg@sanger.otago.ac.nz</email>
            </au>
            <au id="A2" ca="yes">
               <snm>Butler</snm>
               <mi>I</mi>
               <fnm>Margaret</fnm>
               <insr iid="I1"/>
               <email>margi.butler@stonebow.otago.ac.nz</email>
            </au>
            <au id="A3">
               <snm>Poulter</snm>
               <mi>TM</mi>
               <fnm>Russell</fnm>
               <insr iid="I1"/>
               <email>russell.poulter@stonebow.otago.ac.nz</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Biochemistry, University of Otago, P.O. Box 56, Dunedin, New Zealand</p>
            </ins>
         </insg>
         <source>BMC Biology</source>
         <issn>1741-7007</issn>
         <pubdate>2006</pubdate>
         <volume>4</volume>
         <issue>1</issue>
         <fpage>38</fpage>
         <url>http://www.biomedcentral.com/1741-7007/4/38</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17069655</pubid>
               <pubid idtype="doi">10.1186/1741-7007-4-38</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>06</day>
               <month>7</month>
               <year>2006</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>27</day>
               <month>10</month>
               <year>2006</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>27</day>
               <month>10</month>
               <year>2006</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2006</year>
         <collab>Goodwin et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Inteins are self-splicing protein elements. They are translated as inserts within host proteins that excise themselves and ligate the flanking portions of the host protein (exteins) with a peptide bond. They are encoded as in-frame insertions within the genes for the host proteins. Inteins are found in all three domains of life and in viruses, but have a very sporadic distribution. Only a small number of intein coding sequences have been identified in eukaryotic nuclear genes, and all of these are from ascomycete or basidiomycete fungi.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We identified seven intein coding sequences within nuclear genes coding for the second largest subunits of RNA polymerase. These sequences were found in diverse eukaryotes: one is in the second largest subunit of RNA polymerase I (<it>RPA2</it>) from the ascomycete fungus <it>Phaeosphaeria nodorum</it>, one is in the RNA polymerase III (<it>RPC2</it>) of the slime mould <it>Dictyostelium discoideum </it>and four intein coding sequences are in RNA polymerase II genes (<it>RPB2</it>), one each from the green alga <it>Chlamydomonas reinhardtii</it>, the zygomycete fungus <it>Spiromyces aspiralis </it>and the chytrid fungi <it>Batrachochytrium dendrobatidis </it>and <it>Coelomomyces stegomyiae</it>. The remaining intein coding sequence is in a viral relic embedded within the genome of the oomycete <it>Phytophthora ramorum</it>. The <it>Chlamydomonas </it>and <it>Dictyostelium </it>inteins are the first nuclear-encoded inteins found outside of the fungi.</p>
               <p>These new inteins represent a unique dataset: they are found in homologous proteins that form a paralogous group. Although these paralogues diverged early in eukaryotic evolution, their sequences can be aligned over most of their length. The inteins are inserted at multiple distinct sites, each of which corresponds to a highly conserved region of RNA polymerase. This dataset supports earlier work suggesting that inteins preferentially occur in highly conserved regions of their host proteins.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>The identification of these new inteins increases the known host range of intein sequences in eukaryotes, and provides fresh insights into their origins and evolution. We conclude that inteins are ancient eukaryote elements once found widely among microbial eukaryotes. They persist as rarities in the genomes of a sporadic array of microorganisms, occupying highly conserved sites in diverse proteins.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>An intein (internal protein) is a protein sequence that is translated as an insertion within a host protein. The intein is then post-translationally excised, simultaneous with the ligation of the two flanking segments of the host protein <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. The result of intein excision is two proteins derived from a single initial translation product: (i) the free intein sequence, and (ii) the mature form of the host protein, with the two halves (the N-terminal and C-terminal external proteins, or exteins) ligated by a peptide bond. The reactions in which the intein is excised from the precursor protein and the flanking exteins are joined are mediated primarily by the intein itself, although the first residue of the C-extein also has an important role. The term intein strictly refers to a protein molecule, but the gene segment encoding the intein is also often referred to as an intein.</p>
         <p>In addition to containing sequences necessary for their excision and the splicing of their flanking exteins, many inteins have a homing endonuclease domain. Inteins carrying such domains are often referred to as full-length inteins. Some inteins lack a homing endonuclease domain, containing only those sequences necessary for their excision and extein splicing. These are known as mini-inteins. Most of the homing endonuclease domains found in full-length inteins belong to the LAGLIDADG family <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Homing endonucleases are believed to promote the spread of an intein through the gene pool of the host species via a recombination process (homing). In a diploid cell heterozygous for the intein, cleavage of the empty allele by the homing endonuclease will be followed by DNA repair performed by the host repair machinery, using the occupied allele as a template <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. This will result in the cell becoming homozygous for the intein. In this way, the intein gene is duplicated and can spread throughout a population. Most inteins have no known function, and thus are considered to be selfish or parasitic elements <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. However, inteins are efficiently removed from the host protein <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>, so their effect on the host phenotype is minimal.</p>
         <p>The homing pathway is dependent on the homing endonuclease recognition of the target site and on the allelic homology of the surrounding sequences. If an intein homing endonuclease were to cut an ectopic site, this would not precipitate homologous recombination (gene conversion) of the intein sequence because of the lack of flanking homology. For this reason, it is apparently very difficult for inteins to move to (or colonise) a new site, and such ectopic movement is likely to be a very rare event. This belief is supported by the finding that allelic inteins (i.e. inteins inserted at corresponding sites in homologous genes), even in distantly related species, are usually more closely related to each other than they are to non-allelic inteins, including those from the same species <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B9">9</abbr><abbr bid="B15">15</abbr></abbrgrp>.</p>
         <p>Inteins are rarities, and have a puzzling distribution among genes and species: the majority of species do not carry any known inteins, while some species have many; for example, the archaeon <it>Methanococcus jannaschii </it>has 19 distinct inteins. The species that carry inteins do not cluster together on evolutionary trees, but are phylogenetically dispersed, and closely related species do not necessarily have similar sets of inteins. Inteins have only been found in microorganisms. The vast majority of genes have no known inteins, but some genes contain multiple inteins. For instance, replication factor C of <it>M. jannaschii </it>contains three distinct inteins <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> and a ribonucleotide reductase of <it>Trichodesmium erythraeum </it>contains four <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. Of the more than 80 distinct (non-allelic) inteins described, most (>75%) are found in genes involved in replication or transcription, such as DNA polymerases and helicases, or in related processes such as the metabolism of nucleotides (together these genes could be said to have information-processing functions).</p>
         <p>The reasons behind the unusual distribution of inteins are currently unknown. One possible explanation for their phylogenetic distribution is that inteins were formerly much more widespread than they are now, but over time they have been randomly lost on many independent occasions in different lineages, resulting in their current sporadic appearances <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. It is also possible that their distribution is partly a result of horizontal transfer (that is, movement between species that might be only distantly related). The predominance of inteins in information-processing genes may reflect the horizontal transfer of inteins via virus infection <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. The genomes of phage and viruses consist predominantly of genes involved in information processing. It is possible that the pattern of multiple coincident insertions is also a reflection of the inteins occurring predominantly in the subset of genes that are common to cellular organisms and their infecting viruses. Three of the allelic intein groups have members that are genomic and viral. For example, RIR1-l allelic inteins are found in eubacteria, eubacterial phages and the eukaryote iridescent viruses, DnaB-b allelic inteins are present in eubacteria and their phages, while Pol-c allelic inteins are found in archaea and in eukaryote viruses (mimivirus and the <it>Heterosigma akashiwo </it>virus (HaV)).</p>
         <p>In total, five distinct inteins have been found in eukaryotic nuclear genes. These appear in the <it>VMA1 </it>gene that encodes a subunit of a vacuolar membrane adenosine triphosphatase <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B18">18</abbr></abbrgrp>; <it>PRP8</it>, encoding an essential component of the spliceosome <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>; <it>GLT1</it>, glutamate synthase <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>; <it>CHS2</it>, chitin synthase 2 <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>; and <it>ThrRS</it>, threonyl tRNA synthetase (submitted by S. Pietrokovski to InBase <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>). All of these nuclear-encoded inteins have been found exclusively in fungi. VMA inteins have been found in a variety of hemiascomycete yeasts, including <it>Saccharomyces cerevisiae</it>, <it>Kluyveromyces lactis </it>and <it>Candida tropicalis</it>. The PRP8 intein was first found in the basidiomycete fungus <it>Cryptococcus neoformans </it><abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. Since then, PRP8 inteins have been found in some additional <it>Cryptococcus </it>species (<it>C. gattii </it>and <it>C. laurentii</it>) <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> and in a variety of ascomycete fungi, including <it>Aspergillus fumigatus</it>, <it>Histoplasma capsulatum </it>and <it>Botrytis cinerea </it><abbrgrp><abbr bid="B14">14</abbr><abbr bid="B22">22</abbr></abbrgrp> and in three species of <it>Penicillium </it><abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. GLT1 inteins have been identified in a small number of ascomycetes (<it>Debaryomyces hansenii</it>, <it>Pichia guilliermondii</it>, <it>Podospora anserina </it>and <it>Phaeosphaeria nodorum</it>). The CHS2 intein has been found in only one species, <it>P. anserina</it>, despite a large number of fungal CHS2 gene sequences being available in GenBank. Finally, the fifth eukaryotic nuclear full-length intein gene, ThrRS, was very recently identified in the ascomycete yeast <it>C. tropicalis </it>(Pietrokovski, InBase <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>). An allelic mini-intein is also found in the closely related yeast <it>Candida parapsilosis</it>. In addition to these nuclear intein genes, three intein genes have been found in chloroplast genomes: there are allelic inteins in the DnaB helicase genes of the chloroplasts of the cryptophyte alga <it>Guillardia theta </it><abbrgrp><abbr bid="B24">24</abbr></abbrgrp> and the red alga <it>Porphyra purpurea </it><abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, and a distinct intein in the ClpP protease gene of the chloroplasts of the green alga <it>Chlamydomonas eugametos </it><abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp>. Furthermore, inteins have been identified in viruses of eukaryotes: allelic inteins have been found in the DNA polymerase B genes of <it>Acanthamoeba polyphaga </it>mimivirus <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> and HaV01 <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. A distinct full-length intein appears in the <it>RIR1 </it>gene of <it>Chilo </it>iridescent virus <abbrgrp><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr></abbrgrp>, with two other insect iridoviruses (<it>Costelytra zealandica </it>iridescent virus and <it>Wiseana </it>iridescent virus) containing allelic mini-inteins (Pietrokovski, InBase <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>). We have detected an intein in a helicase of PBCV (<it>Paramecium bursaria Chlorella </it>virus; PBCV) NY2A that is not present in the homologous sites of other PBCV strains (authors' unpublished data and InBase <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>).</p>
         <p>DNA-dependent RNA polymerases are complex proteins consisting of several polypeptides including two large and several smaller subunits <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. Eukaryote nuclei generally encode three RNA polymerases: RNA polymerase I synthesizes a pre-rRNA, 45S, which matures into 28S, 18S and 5.8S rRNAs that will form the major RNA sections of the ribosome. RNA polymerase II synthesizes precursors of messenger RNAs and most small nuclear RNAs. RNA polymerase III synthesizes transfer RNAs, 5S ribosomal RNAs and other small RNAs found in the nucleus and cytoplasm. Some of the various subunits of the different RNA polymerases (including the two largest subunits) are encoded by genes that are homologous (paralogous) throughout cellular life. Some viruses also contain homologous genes encoding their own RNA polymerase.</p>
         <p>Here we report the identification and characterisation of seven previously undetected intein-coding sequences from eukaryotic nuclear genomes. These were all identified in genes encoding the second largest subunits of RNA polymerase. They are inserted at six distinct (non-allelic) sites. Four were found in fungi (an ascomycete, a zygomycete and two chytrids), one was found in the slime mould <it>Dictyostelium discoideum</it>, and one in the green alga <it>Chlamydomonas reinhardtii</it>. The last was an intein identified in a viral remnant embedded in the nuclear genome of the oomycete <it>Phytophthora ramorum</it>. Partial sequences of inteins allelic to this latter intein were also identified in the RNA polymerase of a strain of the <it>Emiliania huxleyi </it>virus and in a sequence generated by the Sargasso Sea Metagenomics Project. Analysis of these intein sequences leads to insights into the origins and evolution of inteins in eukaryotes.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Identification of new nuclear intein genes</p>
            </st>
            <p>To identify new eukaryotic intein genes, we used the sequences of a wide variety of previously identified inteins (of eukaryotic, prokaryotic and viral origins) to perform BLAST searches of the publicly available eukaryotic sequence databases (including GenBank, and genome sequencing centre databases containing data not yet released to GenBank; see Methods section). High-quality matches identified in the BLAST searches, putatively representing new inteins, were then examined in detail for features characteristic of inteins <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. For example, most inteins begin with a Cys or a Ser residue, end with the dipeptide His-Asn, and are followed by a Cys, Ser or Thr residue as the first amino acid of the C-extein. Inteins contain a number of conserved motifs associated with splicing, and most inteins contain a homing LAGLIDADG endonuclease domain. Inteins also appear as specific inserts within other proteins, and often appear at highly conserved sites within highly conserved proteins. Another important indicator of the presence of an intein encoding sequence is that the sequence is present in some homologues of the host gene, but is absent in most. Table <tblr tid="T1">1</tblr> summarises the novel inteins described in this report.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Newly described inteins from the second largest subunit of RNA polymerases.</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>Intein</p>
                     </c>
                     <c ca="left">
                        <p>Organism</p>
                     </c>
                     <c ca="left">
                        <p>Taxonomic group</p>
                     </c>
                     <c ca="left">
                        <p>Allele</p>
                     </c>
                     <c ca="left">
                        <p>size</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Pno RPA2</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Phaeosphaeria nodorum</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Ascomycota</p>
                     </c>
                     <c ca="left">
                        <p>RPA2-a</p>
                     </c>
                     <c ca="left">
                        <p>456</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Cre RPB2</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Chlamydomonas reinhardtii</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Green alga</p>
                     </c>
                     <c ca="left">
                        <p>RPB2-a</p>
                     </c>
                     <c ca="left">
                        <p>431</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Cst RPB2</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Coelomomyces stegomyiae</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Chytrid</p>
                     </c>
                     <c ca="left">
                        <p>RPB2-b</p>
                     </c>
                     <c ca="left">
                        <p>362</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Sas RPB2</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Spiromyces aspiralis</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Zygomycota</p>
                     </c>
                     <c ca="left">
                        <p>RPB2-b</p>
                     </c>
                     <c ca="left">
                        <p>354</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Bde RPB2</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Batrachochytrium dendrobatidis</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Chytrid</p>
                     </c>
                     <c ca="left">
                        <p>RPB2-c</p>
                     </c>
                     <c ca="left">
                        <p>488</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Ddi RPC2</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Dictyostelium discoideum</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Amoebozoa</p>
                     </c>
                     <c ca="left">
                        <p>RPC2-a</p>
                     </c>
                     <c ca="left">
                        <p>464</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PrV RPO</p>
                     </c>
                     <c ca="left">
                        <p><it>Phytophthora ramorum </it>virus</p>
                     </c>
                     <c ca="left">
                        <p>StramenopileNCLDV?</p>
                     </c>
                     <c ca="left">
                        <p>RPO-a</p>
                     </c>
                     <c ca="left">
                        <p>incomplete</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Unnamed</p>
                     </c>
                     <c ca="left">
                        <p>Unclassified Sargasso sea</p>
                     </c>
                     <c ca="left">
                        <p>unknown</p>
                     </c>
                     <c ca="left">
                        <p>RPO-a</p>
                     </c>
                     <c ca="left">
                        <p>incomplete</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>EhV RPO</p>
                     </c>
                     <c ca="left">
                        <p><it>Emiliana huxleyi </it>virus 163 *</p>
                     </c>
                     <c ca="left">
                        <p>Haptophyte NCLDV</p>
                     </c>
                     <c ca="left">
                        <p>RPO-a</p>
                     </c>
                     <c ca="left">
                        <p>incomplete</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>*No intein is present at the allelic site in another <it>Emiliana huxleyi </it>virus isolate, <it>Emiliana huxleyi </it>virus 86.</p>
                  <p/>
                  <p>Intein size is expressed as amino-acid residue number. NCLDV indicates a member (or putative member) of the nucleocytoplasmic large DNA virus group.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>An RNA polymerase I intein in <it>Phaeosphaeria nodorum</it></p>
            </st>
            <p><it>P</it>. (<it>Stagonospora</it>) <it>nodorum </it>is a filamentous ascomycete belonging to the class Dothideomycetes. It is a major pathogen of wheat crops. Strain SN15 has been sequenced by the whole genome shotgun method to >10-fold coverage at the Broad Institute <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. The assembled sequence has been made publicly available in GenBank. Using the sequence of the <it>C. eugametos </it>chloroplast Ceu ClpP intein as a query in a TBLASTN search of the whole genome shotgun (WGS) sequence division of the GenBank database we detected a high quality match in <it>P. nodorum </it>sequence AAGI01000064 (E = 3 &#215; 10<sup>-6</sup>; bases 49105&#8211;50471). This sequence can also be found on supercontig 1.4 (from base pair 1094221 to 1095587) at the Broad Institute website <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. The matching region appears as an insert within a gene encoding the second largest subunit of RNA polymerase (Figure <figr fid="F1">1</figr>). Sequence comparisons (Figure <figr fid="F1">1</figr>) and phylogenetic analyses (Figure <figr fid="F2">2</figr>, <supplr sid="S1">additional file 1</supplr>) indicate that the gene encodes a subunit of RNA polymerase I. The insert has numerous features indicating that it encodes an intein. It appears as an insertion within the RNA polymerase gene and consists of an uninterrupted open-reading frame (ORF) encoding 455 amino acids, in phase with the RNA polymerase ORF. The insert begins with a Cys residue and is followed by a Cys residue. The N- and C-terminal regions of the insert contain sequences corresponding to the conserved splicing domains of inteins (alignments of conserved intein domains can be viewed at the InBase website <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>). The central region contains the characteristic motifs of an LAGLIDADG homing endonuclease domain, although this appears to be degenerate and is unlikely to be still active. An unusual feature of this new intein is that it ends with the dipeptide Gly-Asn rather than the more common His-Asn. Gly-Asn termini have, however, been identified previously in inteins, for instance, in Ceu ClpP and in the ThrRS inteins of <it>C. tropicalis </it>and <it>C. parapsilosis</it>. This new intein has been named Pno RPA2, following intein naming conventions. It is the first intein to be identified in a gene encoding the second largest subunit of an RNA polymerase. The only previously identified RNA polymerase inteins (the archaeal inteins Mja rPol A' and Mja rPol A" from <it>M. janaschii </it>and Nph rPol A" from <it>Natronomonas pharaonis</it>) all appear in archaeal homologues of the gene encoding the largest subunit of eukaryotic RNA polymerase.</p>
            <suppl id="S1">
               <title>
                  <p>Additional File 1</p>
               </title>
               <text>
                  <p><b>An alignment of RNA polymerase sequences</b>. Taken from accession data as described in the Methods section.</p>
               </text>
               <file name="1741-7007-4-38-S1.addi">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Intein insertions into eukaryotic and viral RNA polymerases</p>
               </caption>
               <text>
                  <p><b>Intein insertions into eukaryotic and viral RNA polymerases</b>. Alignments of intein/extein borders for the eight inteins in the six RNA polymerase intein insertion sites. RNA polymerase sequences are taken from accession data as described in Methods. The unclassified Sargasso Sea sequence is from GenBank accession <ext-link ext-link-type="gen" ext-link-id="AACY01369547">AACY01369547</ext-link>, the <it>E. huxleyi </it>virus 163 sequence is from GenBank accession (<ext-link ext-link-type="gen" ext-link-id="DQ127798">DQ127798</ext-link>). The dashes represent missing data.</p>
               </text>
               <graphic file="1741-7007-4-38-1"/>
            </fig>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Phylogenetic distance tree of RNA polymerases</p>
               </caption>
               <text>
                  <p><b>Phylogenetic distance tree of RNA polymerases</b>. RNA polymerase sequences are taken from accession data as described in the Methods section. The unrooted tree was constructed by the neighbour-joining method using PAUP*4b10 [52] and the default settings. Numbers on the branches indicate the percentages of bootstrap support indicated by a heuristic search with 100 random addition replicates and the tree-bisection-reconnection branch-swapping algorithm. All bootstrap values > 50 have been reported except where they occur within the three well-supported RNA polymerase I (rpo1), RNA polymerase II (rpo2) and RNA polymerase III (rpo3) groups. RNA polymerases that contain an intein are indicated by asterisks (*); strains of the <it>E. huxleyi </it>virus are polymorphic for the presence of an intein in RNA polymerase (*/-). The alignment used is available as supplementary data (<supplr sid="S1">additional file 1</supplr>).</p>
               </text>
               <graphic file="1741-7007-4-38-2"/>
            </fig>
            <p>Note that the sequence AAGI01000064 has a frameshift in the region corresponding to the intein. Comparisons (not shown) with the <it>P. nodorum </it>sequences in the GenBank trace archives, however, suggest that this is a sequencing error resulting from the insertion of a G residue at position 50225 and a C at 50260. These were removed to generate the full sequence of the RNA polymerase gene with an uninterrupted ORF.</p>
         </sec>
         <sec>
            <st>
               <p>An RNA polymerase II intein in <it>Chlamydomonas reinhardtii</it></p>
            </st>
            <p><it>C. reinhardtii </it>is a unicellular green alga. An intein in this species was first detected in several <it>C. reinhardtii </it>expressed sequence tag (EST) sequences using a TBLASTN search of the GenBank EST databases with the Ctr ThrRS intein sequence as a query. A full-length sequence of the intein was then retrieved from version 2 of the <it>C. reinhardtii </it>genome sequence assembly, available from the Joint Genome Institute <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. The intein lies on scaffold 5, contig 26 (bases 289701&#8211;290993 on the minus strand). The intein, Cre RPB2, appears as an uninterrupted ORF encoding 431 amino acids inserted within the coding region of the <it>C. reinhardtii </it>gene for the second largest subunit of RNA polymerase II (Figures <figr fid="F1">1</figr>, <figr fid="F2">2</figr>). Like many other inteins, it begins with a Cys residue, is followed by a Cys residue in the C-extein, and contains the conserved splicing domains and an LAGLIDADG homing endonuclease domain (see InBase for alignments <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>). Like the Pno RPA intein, Cre RPB2 ends with a Gly-Asn dipeptide rather than the more common His-Asn. This is the first intein encoded in a nuclear genome to be found outside of the fungi.</p>
         </sec>
         <sec>
            <st>
               <p>Further RNA polymerase II inteins</p>
            </st>
            <p>Three further inteins have been found in genes encoding the second-largest subunits of RNA polymerase II. The sequences of these genes were generated as part of the Assembling the Fungal Tree of Life (AFTOL) project <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>, which is using, among other things, RNA polymerase II sequences to assist in determining the relationships among a wide variety of fungi. Inteins appear in <it>RPB2 </it>sequences from <it>Spiromyces aspiralis </it>(<ext-link ext-link-type="gen" ext-link-id="DQ302790">DQ302790</ext-link>), a zygomycete fungus, and <it>Coelomomyces stegomyiae </it>(<ext-link ext-link-type="gen" ext-link-id="DQ302766">DQ302766</ext-link>) and <it>Batrachochytrium dendrobatidis </it>(<ext-link ext-link-type="gen" ext-link-id="DQ302769">DQ302769</ext-link>), both members of the Chytridiomycota. These inteins again have the conserved splicing and endonuclease domains characteristic of inteins (see InBase <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>). They also appear as inserts within the RNA polymerase sequences. The <it>C. stegomyiae </it>intein, Cst RPB2, and the <it>S. aspiralis </it>intein, Sas RPB2, are inserted at homologous sites and are therefore allelic inteins. The <it>B. dendrobatidis </it>intein is inserted at a different site. Both of these sites are distinct from the insertion site of the <it>C. reinhardtii </it>RNA polymerase II intein, Cre RPB2. To distinguish the three intein insertion sites in RNA polymerase II genes they have been denoted "a", "b" and "c", according to the order in which they were identified: Cre RPB2 is in the "a" site, Cst RPB2 and Sas RPB2 are in the "b" site, and Bde RPB2 is in the "c" site. The allelic RPB2-b inteins are present in two very distantly related species, a zygomycete and a chytrid.</p>
         </sec>
         <sec>
            <st>
               <p>An RNA polymerase III intein in <it>Dictyostelium discoideum</it></p>
            </st>
            <p><it>D. discoideum </it>is a slime mould classified within the Mycetozoa. The whole genome sequence has been determined and described <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. An intein in <it>D. discoideum </it>was detected as an insertion of 464 amino acid residues within the second largest subunit of RNA polymerase III (Figures <figr fid="F1">1</figr>, <figr fid="F2">2</figr>; GenBank protein ID no. EAL63250). The intein, Ddi RPC2, appears as a specific insert within the RNA polymerase subunit relative to homologues from other species, and it is inserted at a different site from the <it>P. nodorum </it>RNA polymerase I intein and from any of the RNA polymerase II inteins. The <it>Dictyostelium </it>intein begins with a Cys residue, ends with a standard His-Asn dipeptide and is followed by a Cys residue. The N- and C-terminal parts contain the conserved splicing domains characteristic of inteins <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, while the central region contains a possibly degenerate LAGLIDADG homing endonuclease. Ddi RPC2 contains several low-complexity regions or short runs of the same amino acid. For instance, it contains a region of 13 amino acid residues, of which 11 are Asn residues. It also contains a region with seven consecutive Asn residues and two regions with seven consecutive Gln residues. Such low-complexity regions appear to be common features in <it>D. discoideum </it>proteins <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. In the <it>D. discoideum </it>RNA polymerase III protein, they are restricted to the segment corresponding to the endonuclease domain of the intein, i.e. they are not found in the intein splicing domains or in the RNA polymerase sequence.</p>
         </sec>
         <sec>
            <st>
               <p>An RNA polymerase intein in a viral remnant within the <it>Phytophthora ramorum </it>genome</p>
            </st>
            <p><it>P. ramorum </it>is a member of the oomycetes, belonging to the kingdom Stramenopiles, which also includes diatoms, golden-brown algae and brown algae <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. The genome sequence has been determined by the Joint Genome Institute <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B37">37</abbr></abbrgrp>. Using the Ceu ClpP intein as a query in a TBLASTN search, we detected a high quality match (E = 3.0 &#215; 10<sup>-15</sup>) in the <it>P. ramorum </it>genome (scaffold 19, bases 14734&#8211;15744 on the minus strand). This sequence has numerous features suggesting that it is an intein. For instance, it begins with a Cys residue and contains sequences similar to the splicing domains of other inteins (not shown). These are separated by a region containing an LAGLIDADG homing endonuclease domain similar to that of previously identified inteins. Immediately upstream of this putative intein is a long ORF encoding a putative protein homologous to the second largest subunit of RNA polymerase. The site at which the putative intein interrupts the RNA polymerase ORF is highly conserved, although it is distinct from the insertion sites of the previously identified RNA polymerase inteins (Figure <figr fid="F1">1</figr>).</p>
            <p>In addition to having these similarities to other inteins, this putative intein has unusual features. Firstly, instead of being an uninterrupted ORF, the region encoding the intein-like sequence contains two frameshift mutations, which result in the appearance of stop codons within the coding reading frame. Secondly, although it contains most of the conserved motifs associated with intein splicing, it lacks the conserved residues (usually a His-Asn dipeptide) corresponding to the extreme C-terminal ends of inteins; instead, the corresponding sequence consists of a stop codon and an Arg codon (see <supplr sid="S2">additional file 2</supplr>). These features suggest that the sequence no longer represents a functional intein (comparisons with sequences in the trace archives suggest that most of these are genuine mutations, although one of the frameshifts within the intein is likely to be a sequencing error; data not shown). Likewise, the RNA polymerase gene, in which the putative intein gene is inserted, has some unusual features. Firstly, it also appears to be non-functional; about 780 bp upstream of the intein insertion site, the RNA polymerase coding sequence contains a frameshift mutation and there is a nonsense mutation six codons upstream of the putative intein. Secondly, the section of the RNA polymerase gene expected to lie downstream of the intein gene (i.e. the coding sequence for the C-extein) is missing (<supplr sid="S2">additional file 2</supplr>). Comparisons with the trace archives suggest that these are all genuine mutations. Phylogenetic analyses indicate that this degenerate RNA polymerase gene is not closely related to eukaryotic RNA polymerase I, II or III genes (Figure <figr fid="F2">2</figr>). Instead, it is most closely related (100% bootstrap support) to an RNA polymerase from African swine fever virus (ASFV), a large double-stranded DNA virus that is a member of the nuclear-cytoplasmic large dsDNA virus (NCLDV) group <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. In addition, three intact genes encoding the second largest subunits of RNA polymerases I, II and III can be found in the <it>P. ramorum </it>genome (Figure <figr fid="F2">2</figr>). Close relatives (not shown) of these three genes also appear in the genome sequence of the related species <it>Phytophthora sojae </it>(also sequenced by the JGI), but no close relative of the degenerate ASFV-like RNA polymerase gene is present in the <it>P. sojae </it>genome.</p>
            <suppl id="S2">
               <title>
                  <p>Additional File 2</p>
               </title>
               <text>
                  <p><b>The nucleotide sequence and three-frame conceptual translation of the putative RNA polymerase from <it>P. ramorum</it></b>. The RNA polymerase protein sequence is shaded in red and the intein sequence in blue.</p>
               </text>
               <file name="1741-7007-4-38-S2.doc">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>Further analyses of the sequences surrounding this RNA polymerase gene reveal a likely explanation for its unusual features; when the predicted products of the ORFs in the regions close to the RNA polymerase gene are used in BLASTP searches against the protein sequences in GenBank, the strongest hits are (as with the RNA polymerase itself) often proteins encoded by ASFV (additional files <supplr sid="S3">3</supplr> and <supplr sid="S4">4</supplr>). Most of these proteins do not have close relatives in the <it>P. sojae </it>genome. The ORFs further away from the degenerate RNA polymerase gene, however, do have close matches in <it>P. sojae</it>, and are not closely related to genes found in ASFV. It is therefore likely that a previously unidentified virus related to ASFV has integrated into the <it>P. ramorum </it>genome. This integration would have occurred after the divergence of the lineages leading to <it>P. ramorum </it>and <it>P. sojae</it>, as no trace of the putative viral relic appears in <it>P. sojae</it>. After its integration into the <it>P. ramorum </it>genome, the viral sequence has started to degenerate.</p>
            <suppl id="S3">
               <title>
                  <p>Additional File 3</p>
               </title>
               <text>
                  <p><b>The genomic context of the intein-coding sequence in <it>Phytophthora ramorum</it></b>. The diagram depicts the structures of contigs 4, 5 and 6 of scaffold 19 of the assembled <it>P. ramorum </it>genome sequence. ORFs are represented by the shaded boxes. Blue-shaded boxes represent ORFs having a high quality match (E&lt;1 &#215; 10<sup>-30</sup>) in the assembled <it>Phytophthora sojae </it>genome sequence (see <supplr sid="S4">additional file 4</supplr>). Red-shaded boxes represent ORFs whose best matches among all the protein sequences in GenBank are proteins coded by African swine-fever virus (<supplr sid="S4">additional file 4</supplr>). The intein and associated RNA polymerase are represented by ORFs 6, 7 and 8 of contig 5. ORFs are as determined by the ORF finder program <url>http://www.ncbi.nlm.nih.gov/gorf/gorf.html</url>, except ORF1 of contig 4, which was extended back to the previous stop codon. Contig 6 extends further than the sequence depicted here.</p>
               </text>
               <file name="1741-7007-4-38-S3.eps">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S4">
               <title>
                  <p>Additional File 4</p>
               </title>
               <text>
                  <p>Matches to the ORFs in <it>P. ramorum </it>scaffold 19, contigs 4, 5 and 6.</p>
               </text>
               <file name="1741-7007-4-38-S4.doc">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>An intein in an RNA polymerase sequence isolated from the Sargasso Sea</p>
            </st>
            <p>A putative intein was also identified in a sequence from an unclassified species (IBEA_CTG_SVAEH23TF) found in the environmental samples division of GenBank (accession no. <ext-link ext-link-type="gen" ext-link-id="AACY01369547">AACY01369547</ext-link>). The sequence was generated as part of the shotgun sequencing of samples from the Sargasso Sea <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. The complementary strand of this sequence encodes the N-terminal part of an intein, which includes the conserved splicing motifs and the first motif of a homing endonuclease domain, and is preceded by part of an RNA polymerase. The intein in this sequence is inserted at the same site as that in the putative viral relic in <it>P. ramorum</it>, i.e. they are allelic inteins. Similarity searches at InBase indicate that the most closely similar annotated intein to this Sargasso Sea sequence is Ceu ClpP, the intein from the chloroplast of <it>C. eugametos </it>(E = 2 &#215; 10<sup>-13</sup>). BLAST2 comparisons suggest a closer sequence similarity between the Ceu ClpP intein and the Sargasso Sea sequence (E = 1 &#215; 10<sup>-17</sup>) than between the Sargasso Sea sequence and the <it>P. ramorum </it>virus intein fragment (E = 4 &#215; 10<sup>-5</sup>).</p>
            <p>Phylogenetic analyses (not shown) indicate that the RNA polymerase from which this sequence is derived is most closely related to eukaryotic RNA polymerase II, although it is not highly similar to any sequence of known origin. E-values derived from TBLASTN searches at NCBI <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> indicate that the 59-residue fragment of this RNA polymerase is most similar (52&#8211;56% amino acid identity) to RNA polymerases (RPB2) from fungi (Table <tblr tid="T2">2</tblr>). It is less likely that the sequence is from a marine virus such as one of the large double-stranded DNA viruses from the NCLDV group that infect eukaryotes (40&#8211;46% amino acid identity). These large viruses, some of which infect marine organisms, encode RNA polymerase II-like proteins.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>The unclassified sequence from the Sargasso Sea is unlikely to represent a fragment of a viral genome. TBLASTN searches were conducted at NCBI using as a query the 59 residues from the Sargasso Sea sequence (Accession <ext-link ext-link-type="gen" ext-link-id="AACY01369547">AACY01369547</ext-link>) that formed the putative C-extein. These 59 residues are encoded on the complementary strand, from base 556 to base 732. Each search was restricted to one of the six groups outlined below.</p>
               </caption>
               <tblbdy cols="3">
                  <r>
                     <c ca="left">
                        <p>Accession</p>
                     </c>
                     <c ca="left">
                        <p>Sequences producing significant alignments</p>
                     </c>
                     <c ca="left">
                        <p>E value</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Fungi</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>gb <ext-link ext-link-type="gen" ext-link-id="DQ302778.1">DQ302778.1</ext-link></p>
                     </c>
                     <c ca="left">
                        <p><it>Entomophthora muscae </it>AFTOL-ID28, RPB2</p>
                     </c>
                     <c ca="left">
                        <p>4 &#215; 10<sup>-9</sup></p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>gb <ext-link ext-link-type="gen" ext-link-id="DQ521419.1">DQ521419.1</ext-link></p>
                     </c>
                     <c ca="left">
                        <p><it>Auricularia auricula-judae </it>AFTOL-ID1681</p>
                     </c>
                     <c ca="left">
                        <p>7 &#215; 10<sup>-9</sup></p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>gb <ext-link ext-link-type="gen" ext-link-id="DQ234553.1">DQ234553.1</ext-link></p>
                     </c>
                     <c ca="left">
                        <p><it>Hydnum albomagnum </it>AFTOL-ID 471, RPB2</p>
                     </c>
                     <c ca="left">
                        <p>1 &#215; 10<sup>-8</sup></p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>gb <ext-link ext-link-type="gen" ext-link-id="AY485624.1">AY485624.1</ext-link></p>
                     </c>
                     <c ca="left">
                        <p><it>Hydnum repandum </it>RPB2</p>
                     </c>
                     <c ca="left">
                        <p>1 &#215; 10<sup>-8</sup></p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>gb <ext-link ext-link-type="gen" ext-link-id="DQ302787.1">DQ302787.1</ext-link></p>
                     </c>
                     <c ca="left">
                        <p><it>Umbelopsis ramanniana </it>AFTOL-ID 144, RPB2</p>
                     </c>
                     <c ca="left">
                        <p>2 &#215; 10<sup>-8</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Metazoa</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>gb <ext-link ext-link-type="gen" ext-link-id="XM_793194.1">XM_793194.1</ext-link></p>
                     </c>
                     <c ca="left">
                        <p><it>Strongylocentrotus purpuratus </it>LOC593725</p>
                     </c>
                     <c ca="left">
                        <p>5 &#215; 10<sup>-7</sup></p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>dbj <ext-link ext-link-type="gen" ext-link-id="AK114672.1">AK114672.1</ext-link></p>
                     </c>
                     <c ca="left">
                        <p><it>Ciona intestinalis </it>cDNA, clone:cieg010h22</p>
                     </c>
                     <c ca="left">
                        <p>1 &#215; 10<sup>-6</sup></p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>gb <ext-link ext-link-type="gen" ext-link-id="AC007441.9">AC007441.9</ext-link></p>
                     </c>
                     <c ca="left">
                        <p><it>Drosophila melanogaster </it>clone BACR10E03</p>
                     </c>
                     <c ca="left">
                        <p>2 &#215; 10<sup>-6</sup></p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>gb <ext-link ext-link-type="gen" ext-link-id="BT028050.1">BT028050.1</ext-link></p>
                     </c>
                     <c ca="left">
                        <p><it>Gasterosteus aculeatus </it>clone CNB114-G10</p>
                     </c>
                     <c ca="left">
                        <p>2 &#215; 10<sup>-6</sup></p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>gb <ext-link ext-link-type="gen" ext-link-id="U10333.1">U10333.1</ext-link></p>
                     </c>
                     <c ca="left">
                        <p><it>Caenorhabditis elegans </it>RNA polymerase II</p>
                     </c>
                     <c ca="left">
                        <p>2 &#215; 10<sup>-6</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Plantae</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>emb <ext-link ext-link-type="gen" ext-link-id="AJ565937.1">AJ565937.1</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>MGU565937 <it>Mimulus guttatus </it>partial RPB2</p>
                     </c>
                     <c ca="left">
                        <p>9 &#215; 10<sup>-7</sup></p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>gb <ext-link ext-link-type="gen" ext-link-id="DQ029103.1">DQ029103.1</ext-link></p>
                     </c>
                     <c ca="left">
                        <p><it>Spirogyra sp</it>. UWCC FW670 RPB2</p>
                     </c>
                     <c ca="left">
                        <p>1 &#215; 10<sup>-6</sup></p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>gb <ext-link ext-link-type="gen" ext-link-id="AF020844.1">AF020844.1</ext-link></p>
                     </c>
                     <c ca="left">
                        <p><it>Marchantia polymorpha </it>RPB140 (RPB2)</p>
                     </c>
                     <c ca="left">
                        <p>1 &#215; 10<sup>-6</sup></p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>gb <ext-link ext-link-type="gen" ext-link-id="AY596718.1">AY596718.1</ext-link></p>
                     </c>
                     <c ca="left">
                        <p><it>Tetralocularia pennelii </it>(RPB2)</p>
                     </c>
                     <c ca="left">
                        <p>1 &#215; 10<sup>-6</sup></p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>gb <ext-link ext-link-type="gen" ext-link-id="AY563264.1">AY563264.1</ext-link></p>
                     </c>
                     <c ca="left">
                        <p><it>Selaginella densa </it>RNA polymerase II (RPB2)</p>
                     </c>
                     <c ca="left">
                        <p>1 &#215; 10<sup>-6</sup></p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>emb <ext-link ext-link-type="gen" ext-link-id="AJ566358.1">AJ566358.1</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>GSP566358 <it>Gardenia sp</it>. Oxelman 2319 (RPB2)</p>
                     </c>
                     <c ca="left">
                        <p>1 &#215; 10<sup>-6</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Archaea</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>gb <ext-link ext-link-type="gen" ext-link-id="AE010299.1">AE010299.1</ext-link></p>
                     </c>
                     <c ca="left">
                        <p><it>Methanosarcina acetivorans </it>str. C2A (rpoB)</p>
                     </c>
                     <c ca="left">
                        <p>2 &#215; 10<sup>-6</sup></p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>gb <ext-link ext-link-type="gen" ext-link-id="AE008384.1">AE008384.1</ext-link></p>
                     </c>
                     <c ca="left">
                        <p><it>Methanosarcina mazei </it>strain Goe1 (rpoB)</p>
                     </c>
                     <c ca="left">
                        <p>2 &#215; 10<sup>-6</sup></p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>gb <ext-link ext-link-type="gen" ext-link-id="CP000099.1">CP000099.1</ext-link></p>
                     </c>
                     <c ca="left">
                        <p><it>Methanosarcina barkeri </it>str. fusaro</p>
                     </c>
                     <c ca="left">
                        <p>3 &#215; 10<sup>-6</sup></p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>gb <ext-link ext-link-type="gen" ext-link-id="AE000782.1">AE000782.1</ext-link></p>
                     </c>
                     <c ca="left">
                        <p><it>Archaeoglobus fulgidus </it>DSM 4304 (rpoB1)</p>
                     </c>
                     <c ca="left">
                        <p>4 &#215; 10<sup>-6</sup></p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>emb <ext-link ext-link-type="gen" ext-link-id="BX957222.1">BX957222.1</ext-link></p>
                     </c>
                     <c ca="left">
                        <p><it>Methanococcus maripaludis </it>S2</p>
                     </c>
                     <c ca="left">
                        <p>5 &#215; 10<sup>-6</sup></p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>emb <ext-link ext-link-type="gen" ext-link-id="X14818.1">X14818.1</ext-link></p>
                     </c>
                     <c ca="left">
                        <p><it>Sulfolobus acidocaldarius </it>rpoB</p>
                     </c>
                     <c ca="left">
                        <p>9 &#215; 10<sup>-6</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Viruses</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>gb <ext-link ext-link-type="gen" ext-link-id="AF389451.1">AF389451.1</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>Tiger frog virus, complete genome</p>
                     </c>
                     <c ca="left">
                        <p>2 &#215; 10<sup>-4</sup></p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>gb <ext-link ext-link-type="gen" ext-link-id="AY548484.1">AY548484.1</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>Frog virus 3, complete genome</p>
                     </c>
                     <c ca="left">
                        <p>4 &#215; 10<sup>-4</sup></p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>gb <ext-link ext-link-type="gen" ext-link-id="AY666015.1">AY666015.1</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>Grouper iridovirus, complete genome</p>
                     </c>
                     <c ca="left">
                        <p>0.001</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>gb <ext-link ext-link-type="gen" ext-link-id="AY150217.1">AY150217.1</ext-link></p>
                     </c>
                     <c ca="left">
                        <p><it>Ambystoma tigrinum stebbensi </it>virus</p>
                     </c>
                     <c ca="left">
                        <p>0.001</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>gb <ext-link ext-link-type="gen" ext-link-id="AF397202.1">AF397202.1</ext-link></p>
                     </c>
                     <c ca="left">
                        <p><it>Regina ranavirus </it>clone PstI-3.8</p>
                     </c>
                     <c ca="left">
                        <p>0.001</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>gb <ext-link ext-link-type="gen" ext-link-id="AY653733.1">AY653733.1</ext-link></p>
                     </c>
                     <c ca="left">
                        <p><it>Acanthamoeba polyphaga </it>Mimivirus</p>
                     </c>
                     <c ca="left">
                        <p>0.001</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>emb <ext-link ext-link-type="gen" ext-link-id="AJ890364.1">AJ890364.1</ext-link></p>
                     </c>
                     <c ca="left">
                        <p><it>Emiliania huxleyi </it>virus 86 isolate Ehv86</p>
                     </c>
                     <c ca="left">
                        <p>0.003</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Eubacteria</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>No significant similarity found.</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>An intein in an RNA polymerase sequence found in an isolate of the <it>Emiliania huxleyi </it>virus</p>
            </st>
            <p>A partial intein sequence was identified in a short sequence cloned from <it>E. huxleyi </it>virus 163 (GenBank accession <ext-link ext-link-type="gen" ext-link-id="DQ127798">DQ127798</ext-link>). The allelic site in <it>E. huxleyi </it>virus 86 (accession CAI65861, containing sequence annotated as encoding a RPB2 homologue) does not contain an intein. <it>E. huxleyi </it>is a marine calcifying haptophyte alga, and the virus is a member of the NCLDV group <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>. The intein-like sequence represents only ~50 residues of the C-terminal end of an intein similar to SasRPB2-b and CstRPB2-b (it ends in TGN). The sequence downstream (the C-extein) from the intein-like sequence in <it>E. huxleyi </it>virus 163 encodes residues almost identical to the corresponding region in <it>E. huxleyi </it>virus 86 (Figure <figr fid="F1">1</figr>). This region is immediately adjacent the region corresponding to the insertion site of the <it>P. ramorum </it>virus partial intein and the partial intein from the Sargasso Sea isolate &#8211; that is, these three partial inteins are allelic inteins (Figure <figr fid="F1">1</figr>).</p>
         </sec>
         <sec>
            <st>
               <p>RNA polymerase inteins insert at highly conserved sites</p>
            </st>
            <p>Previous analyses have suggested that inteins usually appear at highly conserved sites within their host proteins <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. One possible reason for this preference is that inteins inserted at such sites are less likely to be removed. Highly conserved sites in proteins usually have important and sequence-specific functions, and, therefore, any deletion that removes the intein sequence would have to be very precise or it would result in a non-functional host gene. In contrast, inteins inserted at poorly conserved sites might be successfully removed by a wide range of imprecise deletions. A second possibility is that inteins inserted at highly conserved sites may be more likely than inteins inserted at poorly conserved sites to spread successfully throughout the gene pool of a species or to undergo a successful horizontal transmission to a new species, as the homing endonuclease recognition site is more likely to be conserved. With six distinct insertion sites within homologous genes, these new RNA polymerase inteins provide a good opportunity to examine this phenomenon in detail in a eukaryote system. We therefore created an alignment of eukaryotic RNA polymerases, plotted the level of conservation in 10 amino-acid windows across the alignment, and mapped the intein insertion sites onto the plot (Figure <figr fid="F3">3</figr>). As can be seen, the intein insertion sites each correspond to one of the peaks in the sequence conservation plot, indicating that each is inserted into one of the most highly conserved sites within these genes. Not all highly conserved sites can act as intein refuges because they must contain appropriate flanking residues (for example the C-extein Cys or Ser).</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Profile of RNA polymerase alignment showing high level of conservation at intein insertion sites</p>
               </caption>
               <text>
                  <p><b>Profile of RNA polymerase alignment showing high level of conservation at intein insertion sites</b>. The plot was generated from an alignment of multiple eukaryotic RNA polymerase I, II and III sequences using the PLOTSIMILARITY program of the GCG package of sequence analysis programs [49]. Intein location positions are as follows: BdeRPB2-c 843; Ddi RPC2 853; PnoRPA2 1195; <it>P. ramorum </it>virus, Sargasso Sea isolate, <it>E. huxleyi </it>virus 163 1516; SasRPB2-b, CstRPB2-b 1664; CreRPB2-a 1696. Intron locations are indicated by a short vertical line at their insertion site and are as follows: <it>D. discoideum </it>1&#8211;111, 2&#8211;184; <it>P. nodorum </it>5&#8211;301, 17&#8211;1192; <it>B. dendrobatidis </it>13&#8211;826, 21&#8211;1357; <it>C. reinhardtii </it>3&#8211;222, 4&#8211;295, 6&#8211;335, 7&#8211;376, 8&#8211;439, 9&#8211;545, 10&#8211;657, 11&#8211;709, 12&#8211;797, 14&#8211;867, 15&#8211;998, 16&#8211;1093, 18&#8211;1217, 19&#8211;1287, 20&#8211;1346, 22&#8211;1401, 23&#8211;1511, 24&#8211;1587, 25&#8211;1688, 26&#8211;1817.</p>
               </text>
               <graphic file="1741-7007-4-38-3"/>
            </fig>
            <p>We also mapped onto the plot the positions of all the spliceosomal introns from the intein-containing RNA polymerase genes. There are two introns in the genes from <it>P. nodorum</it>, <it>B. dendrobatidis </it>and <it>D. discoideum</it>, and 20 in the <it>C. reinhardtii </it>gene. None was found in the <it>S. aspiralis </it>or <it>C. stegomyiae </it>genes or in the putative proviral gene from <it>P. ramorum</it>. As can be seen in Figure <figr fid="F3">3</figr>, some introns are inserted at highly conserved positions, but others are inserted at sites that are only moderately or are poorly conserved, showing that, in contrast to inteins, RNA polymerase introns do not preferentially appear at highly conserved sites.</p>
            <p>Having mapped the intein insertion sites onto the eukaryotic RNA polymerase sequence conservation plot (Figure <figr fid="F3">3</figr>) and determined that these were in regions of high sequence conservation, we wished to discover where these sites occurred in the three-dimensional protein, including which structural domains correspond to the intein insertion sites. It is of interest to plot these sites because inteins at some positions might be more easily processed during protein folding. Because the RNA polymerase subunits RPA2, RPB2 and RPC2 are similar in structure, we used as our common template the structure of RPB2 from <it>S. cerevisiae </it><abbrgrp><abbr bid="B42">42</abbr><abbr bid="B43">43</abbr></abbrgrp>. It is possible to map onto the protein structure the position of the RPA2, RPB2, RPC2 and viral RPO II intein insertion sites, as if they were all inserted into the homologous RPB2 protein. Using data from the Protein Data Bank <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> (entry 1I3Q) and the MacPyMOL molecular visualisation system <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>, we highlighted six residues immediately adjacent to each intein insertion site. Figure <figr fid="F4">4</figr> illustrates the assembly of RPB2 and RPB1 into the core heterodimer. Using the terminology of Cramer <it>et al </it><abbrgrp><abbr bid="B42">42</abbr></abbrgrp>, both the Ddi RPC2 and Bde RPB2 insertion sites are in the 'fork' domain, the Pno RPA2, Cst RPB2, Sas RPB2 and the viral RPO intein found in <it>P. ramorum </it>are all inserted within the 'hybrid binding' domain, and the Cre RPB2 intein insertion site is in the anchor domain. All of these sites are close to the active site of RPB2. None of the insertion sites are found on the external surface of the protein; all are on the surface of the 'cleft' formed by the RPB1/RPB2 heterodimer or are on the interface between these subunits (Figure <figr fid="F4">4</figr>). It is highly probable, therefore, that inteins inserted into these sites in any of the homologues will need to be accurately spliced out before the protein subunits can assume their correct folds and the active RNA polymerase complex can be assembled. In contrast, if inteins were present on the surface of the heterodimer, they could undergo inactivation and progressive deletion without necessarily impairing the assembly and function of the RNA polymerase.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>The position of the six RNA polymerase intein insertion sites mapped onto the crystal structure of the RNA polymerase II of <it>Saccharomyces cerevisiae</it></p>
               </caption>
               <text>
                  <p><b>The position of the six RNA polymerase intein insertion sites mapped onto the crystal structure of the RNA polymerase II of <it>Saccharomyces cerevisiae</it></b>. Representation of <it>S. cerevisiae </it>RNA polymerase II (PDB: 1I3Q); the second largest subunit is coloured dark grey (other subunits, including the largest, are coloured light brown). Top: surface view showing the position of the cleft formed by the two largest subunits and the position of the four intein insertion sites (indicated by different colours) on the surface of the cleft, near the active site/"wall" region. Lower: three surface views from a different orientation; the middle image has all of the subunits other than the second largest subunit as a semi-transparent surface so that the position of the two intein insertion sites on the interface between RPB1 and RPB2 (red and blue regions) can be seen. The lowest image is of RPB2 only, but with the intein insertion sites labelled with the names of inteins inserted at these positions in some homologues.</p>
               </text>
               <graphic file="1741-7007-4-38-4"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Relationships among inteins</p>
            </st>
            <p>Previous work with eukaryotic inteins has shown that allelic inteins are usually each other's closest relatives. For instance, the wide variety of PRP8 inteins identified in ascomycete and basidiomycete fungi form a monophyletic group, relative to all other known inteins. Similarly, the yeast VMA1 inteins also appear as a monophyletic group. There is some evidence to suggest that many of the previously identified eukaryotic nuclear inteins (i.e., VMA1, PRP8, GLT1 and CHS2) may be more closely related to each other than they are to most inteins encoded by non-nuclear genes <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. There is no evidence suggesting that the nuclear-encoded inteins are closely related to eukaryotic inteins encoded by chloroplast genes, or inteins encoded by eukaryotic viruses. Indeed, some of these latter inteins are alleles of, and closely related to, inteins found in prokaryotes. For instance, the DNA polymerase B inteins of the <it>A. polyphaga </it>mimivirus and HaV01 are most closely related to allelic DNA polymerase inteins from various archaea <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>.</p>
            <p>To study the relationships among the new RNA polymerase inteins and previously identified inteins, we constructed phylogenies based on alignments of conserved intein splicing domains. Homing endonuclease domains were not included in this analysis because they are absent from mini-inteins and because, even in full-length inteins, they are often degenerate and therefore might produce misleading results. The sequences to be aligned were edited so as to remove all of the intein residues between the end of the N-terminal splicing domain and the beginning of the C-terminal splicing domain. These domains were determined by comparison with other intein sequences available at InBase <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. The resulting alignment has 102 positions and is available as supplementary data (<supplr sid="S5">additional data file 5</supplr>). An example of a tree containing all the new RNA polymerase inteins and a wide variety of previously identified inteins, including most of the known eukaryotic inteins and representatives of many of the allele groups of prokaryotic inteins, is shown in Figure <figr fid="F5">5</figr>. Most of the relationships observed on this tree are consistent with results obtained previously. For instance, the PRP8 inteins all group together, as do the VMA1 inteins.</p>
            <suppl id="S5">
               <title>
                  <p>Additional File 5</p>
               </title>
               <text>
                  <p>An alignment of the intein splicing domains used to create Figure <figr fid="F5">5</figr>.</p>
               </text>
               <file name="1741-7007-4-38-S5.txt">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Phylogenetic tree of intein splicing domains</p>
               </caption>
               <text>
                  <p><b>Phylogenetic tree of intein splicing domains</b>. This unrooted distance tree was constructed by the neighbour-joining method using PAUP*4b10 [52] with the default settings. Numbers on the branches indicate the percentages of bootstrap support derived from a heuristic search with 100 random addition replicates; this search included a tree-bisection-reconnection branch-swapping algorithm. All bootstrap values > 50 have been reported except in cases where allelic inteins fall within a well-supported (95&#8211;100% bootstrap) group, when some values >50 have been omitted for reasons of space. Inteins encoded by nuclear genes are highlighted with an asterisk. The alignment used is available as supplementary data (<supplr sid="S5">additional file 5</supplr>).</p>
               </text>
               <graphic file="1741-7007-4-38-5"/>
            </fig>
            <p>The new eukaryotic RNA polymerase inteins do not generally appear to be closely related to each other, despite being present in homologous (in some cases paralogous) genes. This is not perhaps unexpected, however, as most are inserted at different sites in these genes and therefore are not allelic inteins. The <it>C. reinhardtii </it>RNA polymerase II intein appears to be most closely related to the threonyl transfer RNA synthetase inteins from <it>C. tropicalis </it>and <it>C. parapsilosis</it>. This grouping receives a high level of bootstrap support (100%). This is unusual as these inteins are not alleles and are found in different kingdoms. The <it>P. nodorum </it>RNA polymerase I intein is not closely related to any other known intein, although it does fall within a moderately supported (68%) group that also includes the <it>C. eugametos </it>chloroplast Ceu ClpP intein, Sas RPB2, Cst RPB2, the putative viral intein embedded within the <it>P. ramorum </it>genome, and a variety of prokaryotic inteins. All these inteins (including Cre RPB2, Cpa ThrRS and Ctr ThrRS, Pno RPA2, Sas RPB2 and Cst RPB2, and Ceu ClpP), together with a set of prokaryotic inteins, form a well-supported (99%) cluster distinct from all other inteins. The <it>B. dendrobatidis </it>RNA polymerase II intein (Bde RPB2) and the <it>D. discoideum </it>RNA polymerase III intein (Ddi RPC2) lie outside of this cluster. Although on this tree, they appear as each other's closest known relative, this grouping does not receive high levels of support (60%) and the two inteins are not particularly similar in sequence (~20% identity), so the significance of the grouping is uncertain.</p>
            <p>The topology of the distance trees generated from the alignment data is generally very similar if different tree-building algorithms such as quartet puzzling or parsimony analyses are used. The bootstrap values generated follow a similar pattern also, with one exception; the node that joins the PRP8 allelic inteins with the VMA allelic inteins can attract values ranging from 56% to 96%. The bootstrap value of the node that groups many of the newly described RNA polymerase inteins into a cluster distinct from all other inteins ranges from 70% (fast heuristic search, no branch-swapping) to 99% (heuristic search with branch-swapping).</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>We have identified coding sequences for seven new inteins within nuclear genes. These are all present within homologous genes encoding the second largest subunits of RNA polymerase. One is present in an RNA polymerase I subunit, four (including two allelic inteins) in a RNA polymerase II subunit, one in RNA polymerase III, and the last is found in a viral RNA polymerase in a degenerate provirus. In addition, we identified a sequence from an unknown organism from the Sargasso Sea that contains a partial sequence of an intein allelic to that of the provirus, and a partial sequence of a further allelic intein from <it>E. huxleyi </it>virus 163. These new inteins raise the number of distinct (non-allelic) nuclear-encoded inteins identified to 11 (or 10 if the proviral intein is excluded).</p>
         <p>The new inteins from <it>C. reinhardtii</it>, a green alga, and from <it>D. discoideum</it>, a cellular slime mould (Amoebozoa), are the first nuclear-encoded inteins to be found outside of the fungi. These findings indicate that there is no particular barrier to the functioning of inteins in non-fungal eukaryote nuclei. They also have implications for our understanding of the origins and evolution of nuclear inteins. For instance, they suggest either that inteins have a much longer history in nuclear genomes than was previously evident, or perhaps that they have invaded nuclear genomes on multiple occasions or are capable of widespread horizontal transmission. They also suggest that inteins will be identified in further diverse eukaryotes as more genome sequences are determined.</p>
         <p>Inteins have now been found in many kingdoms of eukaryotes. They are present in Opisthokonts (in many fungal species and in the viruses of insects), in Amoebozoa (<it>Dictyostelium </it>RPC2 and the mimivirus intein, APMV PolB, in <it>Acanthamoeba</it>), in green plants (<it>C. reinhardtii </it>RPB2 and the <it>C. eugametos </it>plastid ClpP protease), in the red alga (the plastid DnaB helicase of <it>P. purpurea</it>) and a cryptophyte (the plastid DnaB helicase of <it>G. theta</it>). Inteins are found in the viruses of haptophyte algae (<it>E. huxleyi </it>virus intein, EhV163_RPO) and viruses of Stramenopiles, both photosynthetic golden-brown algae (<it>Heterosigma </it>virus intein, HaV01 PolB), and the non-photosynthetic oomycete (<it>P. ramorum</it>, PrV_RPO).</p>
         <p>The intein from the viral relic embedded in <it>P. ramorum </it>is the first example of an intein in a eukaryotic provirus. This intein is of particular interest in the context of the possibility of horizontal transmission of inteins, as it has been suggested that viruses might mediate the movement of inteins between species <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. For instance, an intein present in a particular gene in a cellular genome might be able to home to a homologous gene in an infecting virus. If this virus were then to infect a second species, the intein could potentially undergo a second homing reaction and become inserted into the homologous gene in the new species. This idea is supported by the presence of allelic inteins in bacteriophage and bacterial genomes <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. For example, allelic DnaB-b inteins are found in ~17 species of eubacteria and in a giant phage found in <it>Pseudomonas aeruginosa</it>. Allelic inteins in the RIR1-i insertion site of ribonucleoside-diphosphate reductase are present in prophages from two strains of <it>Bacillus subtilis</it>, in three eukaryote viruses and in a cyanobacterium <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Although no nuclear inteins that are alleles of the <it>P. ramorum </it>proviral intein have yet been identified, the finding supports the possibility that such a horizontal transmission might take place in eukaryotes. The Sargasso Sea intein fragment may represent such a nuclear-encoded homologue of the <it>P. ramorum </it>proviral intein; alternatively it may be derived from a eukaryotic nucleocytoplasmic large DNA virus (NCLDV). An intein fragment is present in the allelic site of one isolate of the <it>E. huxleyi </it>virus, a member of the NCLDV group. However, the <it>P. ramorum </it>proviral intein is intriguing, because the chances of a successful intein transmission from virus to host would be increased by the integration of the viral DNA into the host genome, as then the viral DNA would be a stable part of the host genome and would be available to act as a template for DNA repair (an essential part of the homing process) for much longer than in a transient infection.</p>
         <p>The six sites where the inteins are inserted are among the most highly conserved regions of the second largest subunit of RNA polymerase. This is consistent with previous findings that inteins are usually found at highly conserved sites. It is not clear why RNA polymerase has so many inteins, however, when no other nuclear gene has more than one known intein. It is possible that it is related to the presence of RNA polymerase genes in a variety of viruses. This may increase the likelihood of an intein being horizontally transferred, which, according to the proposed lifecycle of inteins, may increase the likelihood of it surviving for long periods of time. Multiple alleles were detected at two of the new intein sites; the other four sites were represented by single inteins. This emphasises the extremely sporadic distribution of inteins. Many examples of RNA polymerase genes have been sequenced, because of their usefulness in phylogenetic studies, but inteins have been found in few.</p>
         <p>The non-allelic RNA polymerase inteins are not highly similar to each other, or to any previously identified inteins. Five of the inteins, Pno RPA2, Cre RPB2, Cbe RPB2, Sas RPB2 and the intein from the provirus in <it>P. ramorum</it>, however, form part of a well-supported but diverse group of inteins that also includes the <it>Candida </it>ThrRS inteins, Ceu ClpP and several prokaryotic inteins. Within this group, the Cre RPB2 intein appears to be most closely related to the ThrRS inteins (100% bootstrap support), which is unusual as these are not allelic inteins. Similarly, Cst RPB2 and Sas RPB2 form a well-supported group with the non-allelic Ceu ClpP intein. These findings raise the possibility that, in each of these cases, one of the alleles is derived from the other via the ectopic movement of an ancestral intein. There is, however, no obvious similarity among the nucleotide sequences that flank these non-allelic inteins. Such similarity might have suggested that a homing endonuclease had cleaved a degenerate site and promoted an ectopic conversion, but it is unlikely to be detected; even the allelic inteins CstRPB2 and SasRPB2 show &lt;80% sequence identity in this region (all but two of the changes are third codon substitutions).</p>
         <p>The finding that clades representing nuclear-encoded inteins are dispersed throughout the intein phylogeny, intermingled with clades representing eubacterial, archaeal and viral inteins (Figure <figr fid="F5">5</figr>), suggests that inteins have a very long history in eukaryotes, dating back to eukaryotic origins, and/or that horizontal intein transmission between eukaryotes and prokaryotes has occurred at multiple points. Given the lack of compelling evidence for the occurrence of horizontal transmission of eukaryotic inteins (i.e. there are no examples of highly similar inteins in distantly related host species), together with the general high degree of diversity in the intein sequences, we favour the former possibility that inteins were present in the very earliest eukaryotes. Their present-day sporadic distribution is likely to be primarily the result of multiple, independent losses in different lineages.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>Seven complete new nuclear-encoded inteins were identified and characterised. These inteins were all found in genes encoding the second-largest subunits of RNA polymerase. The inteins were found at six distinct (non-allelic) sites, i.e., only two of them are allelic. Four of the inteins are from fungi (one from an ascomycete, one from a zygomycete and two from chytrids). One intein was found in the green alga <it>C. reinhardtii </it>and one in the slime mould <it>D. discoideum</it>. These are the first nuclear-encoded inteins from outside of the fungi. The seventh new intein is from a provirus embedded within the genome of an oomycete (the kingdom Stramenopiles). These new inteins substantially increase the number of described nuclear-encoded inteins and also widen the diversity of species known to harbour such inteins. The data suggest that inteins have a long history in eukaryotes, probably dating back to their earliest origins.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Sequence databases</p>
            </st>
            <p>The sequence databases used were:</p>
            <p>&#8226; The Joint Genome Institute <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>.</p>
            <p>&#8226; The Wellcome Trust Sanger Institute <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>.</p>
            <p>&#8226; The Broad Institute <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>.</p>
            <p>&#8226; Washington University Genome Sequencing Center <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>.</p>
            <p>&#8226; National Center for Biotechnology Information <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Bioinformatics analyses</p>
            </st>
            <p>General sequence analyses were carried out using the programs of the GCG package <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>. Sequence similarity searches were carried out using the BLAST servers at GenBank <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>, InBase <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> or the various genome-sequencing centres mentioned above. Multiple sequence alignments were constructed using CLUSTAL_X <abbrgrp><abbr bid="B50">50</abbr></abbrgrp> and refined using SEAVIEW <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>. Phylogenetic analyses were performed using PAUP* <abbrgrp><abbr bid="B52">52</abbr></abbrgrp> using the default settings unless otherwise noted.</p>
         </sec>
         <sec>
            <st>
               <p>Sequences</p>
            </st>
            <p>Intein protein sequences were retrieved from InBase <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> under the standard intein names. Protein sequences for the second largest subunits of RNA polymerase sequences were retrieved from GenBank <abbrgrp><abbr bid="B48">48</abbr></abbrgrp> using the following protein ID numbers.</p>
         </sec>
         <sec>
            <st>
               <p>Eukaryotes</p>
            </st>
            <p>&#8226; <it>Schizosaccharomyces pombe </it>Pol. I, CAB66435; Pol. II, Q02061; Pol III, CAA93558.</p>
            <p>&#8226; <it>Aspergillus fumigatus </it>Pol. I, EAL88681; Pol. II, EAL84702; Pol III, EAL87958.</p>
            <p>&#8226; <it>Saccharomyces cerevisiae </it>Pol. I, AAA34993; Pol. II, CAA99357; Pol III, CAA99422.</p>
            <p>&#8226; <it>Dictyostelium discoideum </it>Pol. I, EAL60592; Pol. II, EAL63310; Pol III, EAL63250.</p>
            <p>&#8226; <it>Homo sapiens </it>Pol. I, AAX81999; Pol. II, AAH23503; Pol III, AAH46238.</p>
            <p>&#8226; <it>Drosophila melanogaster </it>Pol. I, AAF51503; Pol. II, AAF55024; Pol III, AAF58590.</p>
            <p>&#8226; <it>Arabidopsis thaliana </it>Pol. I, AAG52049; Pol. II, CAB36815; Pol III, BAB11387.</p>
            <p>&#8226; <it>Cryptosporidium parvum </it>Pol. I, EAK88354; Pol. II, EAK90367; Pol III, EAK87469.</p>
            <p>&#8226; <it>Encephalitozoon cuniculi </it>Pol. I, CAD26190; Pol. II, CAD25744; Pol III, CAD25947.</p>
            <p>&#8226; The <it>P. nodorum </it>RNA polymerase sequences were predicted from the genes on the following sequences: RNA Pol. I, AAGI01000064; RNA Pol. II, AAGI01000234; RNA Pol. III, AAGI01000034.</p>
            <p>&#8226; The <it>P. ramorum </it>RNA polymerase sequences were predicted from the genes on the following sequences: RNA Pol. I, scaffold 163; RNA Pol. II, scaffold 60; RNA Pol. III, scaffold 33.</p>
            <p>&#8226; The <it>Chlamydomonas reinhardtii </it>RNA polymerase II gene sequence was assembled from sequences from version 2 of the genome assembly (Scaffold 5. contigs 26, 27 and 28) combined with sequences from the trace archive (589516860, 651002588, 591226556, 650233847, 587272333). Introns were identified by comparison to other RNA polymerases.</p>
         </sec>
         <sec>
            <st>
               <p>Eukaryotic viruses</p>
            </st>
            <p>&#8226; African swine fever virus AAA65283</p>
            <p>&#8226; <it>Emiliana huxleyi </it>virus 86 CAI65861</p>
            <p>&#8226; <it>Acanthamoeba polyphaga </it>mimivirus AAQ09583</p>
            <p>&#8226; Chilo iridescent virus AAK82288</p>
            <p>&#8226; Grouper iridovirus AAV91067</p>
            <p>&#8226; Frog virus 3 AAT09722</p>
            <p>&#8226; Lymphocystis disease virus AAU10873</p>
            <p>&#8226; Rock bream virus AAT71848</p>
            <p>&#8226; Swinepox virus AAL69852</p>
            <p>&#8226; Orf virus AAR98326</p>
            <p>&#8226; Melanoplus sanguinipes entomopoxvirus T28316</p>
            <p>&#8226; Amsacta moorei entomopoxvirus AAG02772</p>
            <p>&#8226; Vaccinia virus AAB96526</p>
         </sec>
         <sec>
            <st>
               <p>Archaea</p>
            </st>
            <p>&#8226; <it>Pyrococcus furiosus </it>AAL81688</p>
            <p>&#8226; <it>Ferroplasma acidiarmanus </it>EAM93828.</p>
         </sec>
         <sec>
            <st>
               <p>Bacteria</p>
            </st>
            <p>&#8226; <it>Staphylococcus aureus </it>AAW37698</p>
            <p>&#8226; <it>Crocosphaeria watsonii </it>EAM50876</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>TG participated in intein discovery and the initial data analyses. MB participated in the phylogenetic analyses and examination of aspects of RNA polymerase structure. RP contributed to the design of the study and to the analysis of the results. All of the authors participated in the manuscript preparation and have read and approved the final version.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We are grateful to the Broad Institute of MIT and Harvard for access to the sequence data of the many fungal genomes sequenced there. We also used sequence data provided by the US Department of Energy's Joint Genome Institute and data generated by the Assembling the Fungal Tree of Life (AFTOL) project. AFTOL involves many members of the international fungal systematics community and is supported by the National Science Foundation under Grant No. DEB-0228725. We are also indebted to Dr Francine Perler and others who maintain the intein database at New England Biolabs. The manuscript was improved after comments from anonymous reviewers. We are also grateful to Dr Sue Cutfield and Bronwyn Carlisle for advice and help in the production of Figure <figr fid="F4">4</figr>. TJDG was supported by a New Zealand Science and Technology Post-Doctoral Fellowship (contract no. UOOX0222). MIB was supported by the New Zealand Lottery Grants Board.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Protein splicing elements: inteins and exteins &#8211; a definition of terms and recommended nomenclature</p>
            </title>
            <aug>
               <au>
                  <snm>Perler</snm>
                  <fnm>FB</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>EO</fnm>
               </au>
               <au>
                  <snm>Dean</snm>
                  <fnm>GE</fnm>
               </au>
               <au>
                  <snm>Gimble</snm>
                  <fnm>FS</fnm>
               </au>
               <au>
                  <snm>Jack</snm>
                  <fnm>WE</fnm>
               </au>
               <au>
                  <snm>Neff</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Noren</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Thorner</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Belfort</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1994</pubdate>
            <volume>22</volume>
            <fpage>1125</fpage>
            <lpage>1127</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">523631</pubid>
                  <pubid idtype="pmpid">8165123</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Protein splicing of inteins and hedgehog autoproteolysis: structure, function, and evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Perler</snm>
                  <fnm>FB</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1998</pubdate>
            <volume>92</volume>
            <fpage>1</fpage>
            <lpage>4</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0092-8674(00)80892-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">9489693</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Protein splicing and related forms of protein autoprocessing</p>
            </title>
            <aug>
               <au>
                  <snm>Paulus</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Annu Rev Biochem</source>
            <pubdate>2000</pubdate>
            <volume>69</volume>
            <fpage>447</fpage>
            <lpage>496</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.biochem.69.1.447</pubid>
                  <pubid idtype="pmpid" link="fulltext">10966466</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Protein-splicing intein: Genetic mobility, origin, and evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>XQ</fnm>
               </au>
            </aug>
            <source>Annu Rev Genet</source>
            <pubdate>2000</pubdate>
            <volume>34</volume>
            <fpage>61</fpage>
            <lpage>76</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.genet.34.1.61</pubid>
                  <pubid idtype="pmpid" link="fulltext">11092822</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Inteins: structure, function, and evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Gogarten</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Senejani</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Zhaxybayeva</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Olendzenski</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Hilario</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Annu Rev Microbiol</source>
            <pubdate>2002</pubdate>
            <volume>56</volume>
            <fpage>263</fpage>
            <lpage>287</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.micro.56.012302.160741</pubid>
                  <pubid idtype="pmpid" link="fulltext">12142479</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>InBase, the Intein Database and Registry</p>
            </title>
            <url>http://www.neb.com/neb/inteins.html</url>
         </bibl>
         <bibl id="B7">
            <title>
               <p>InBase: the Intein Database</p>
            </title>
            <aug>
               <au>
                  <snm>Perler</snm>
                  <fnm>FB</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>383</fpage>
            <lpage>384</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">99080</pubid>
                  <pubid idtype="pmpid" link="fulltext">11752343</pubid>
                  <pubid idtype="doi">10.1093/nar/30.1.383</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Homing of a DNA endonuclease gene by meiotic gene conversion in <it>Saccharomyces cerevisiae</it></p>
            </title>
            <aug>
               <au>
                  <snm>Gimble</snm>
                  <fnm>FS</fnm>
               </au>
               <au>
                  <snm>Thorner</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1992</pubdate>
            <volume>357</volume>
            <fpage>301</fpage>
            <lpage>306</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/357301a0</pubid>
                  <pubid idtype="pmpid" link="fulltext">1534148</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Intein spread and extinction in evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Pietrokovski</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>465</fpage>
            <lpage>472</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(01)02365-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">11485819</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Protein splicing converts the yeast <it>TFP1 </it>gene product to the 69-kD Subunit of the vacuolar h+-adenosine triphosphatase</p>
            </title>
            <aug>
               <au>
                  <snm>Kane</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>Yamashiro</snm>
                  <fnm>CT</fnm>
               </au>
               <au>
                  <snm>Wolczyk</snm>
                  <fnm>DF</fnm>
               </au>
               <au>
                  <snm>Neff</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Goebl</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Stevens</snm>
                  <fnm>TH</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1990</pubdate>
            <volume>250</volume>
            <fpage>651</fpage>
            <lpage>657</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.2146742</pubid>
                  <pubid idtype="pmpid" link="fulltext">2146742</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>The <it>Mycobacterium xenopi </it>GyrA protein splicing element: characterization of a minimal intein</p>
            </title>
            <aug>
               <au>
                  <snm>Telenti</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Southworth</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Alcaide</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Daugelat</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Jacobs</snm>
                  <fnm>WR</fnm>
                  <suf>Jr</suf>
               </au>
               <au>
                  <snm>Perler</snm>
                  <fnm>FB</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>1997</pubdate>
            <volume>179</volume>
            <fpage>6378</fpage>
            <lpage>6382</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">179553</pubid>
                  <pubid idtype="pmpid" link="fulltext">9335286</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>An alternative protein splicing mechanism for inteins lacking an N-terminal nucleophile</p>
            </title>
            <aug>
               <au>
                  <snm>Southworth</snm>
                  <fnm>MW</fnm>
               </au>
               <au>
                  <snm>Benner</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Perler</snm>
                  <fnm>FB</fnm>
               </au>
            </aug>
            <source>EMBO J</source>
            <pubdate>2000</pubdate>
            <volume>19</volume>
            <fpage>5019</fpage>
            <lpage>5026</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">314217</pubid>
                  <pubid idtype="pmpid" link="fulltext">10990465</pubid>
                  <pubid idtype="doi">10.1093/emboj/19.18.5019</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Four inteins and three group II introns encoded in a bacterial ribonucleotide reductase gene</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>X-Q</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Meng</snm>
                  <fnm>Q</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>2003</pubdate>
            <volume>278</volume>
            <fpage>46826</fpage>
            <lpage>46831</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.M309575200</pubid>
                  <pubid idtype="pmpid" link="fulltext">12975359</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Prp8 intein in fungal pathogens: target for potential antifungal drugs</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>XQ</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>FEBS Lett</source>
            <pubdate>2004</pubdate>
            <volume>572</volume>
            <fpage>46</fpage>
            <lpage>50</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.febslet.2004.07.016</pubid>
                  <pubid idtype="pmpid" link="fulltext">15304322</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Compilation and analysis of intein sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Perler</snm>
                  <fnm>FB</fnm>
               </au>
               <au>
                  <snm>Olsen</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Adam</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>1087</fpage>
            <lpage>1093</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146560</pubid>
                  <pubid idtype="pmpid" link="fulltext">9092614</pubid>
                  <pubid idtype="doi">10.1093/nar/25.6.1087</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Complete genome sequence of the methanogenic archaeon, <it>Methanococcus jannaschii</it></p>
            </title>
            <aug>
               <au>
                  <snm>Bult</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Olsen</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Fleischmann</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>GG</fnm>
               </au>
               <au>
                  <snm>Blake</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>FitzGerald</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Clayton</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Gocayne</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Kerlavage</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Dougherty</snm>
                  <fnm>BA</fnm>
               </au>
               <au>
                  <snm>Tomb</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Adams</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Reich</snm>
                  <fnm>CI</fnm>
               </au>
               <au>
                  <snm>Overbeek</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kirkness</snm>
                  <fnm>EF</fnm>
               </au>
               <au>
                  <snm>Weinstock</snm>
                  <fnm>KG</fnm>
               </au>
               <au>
                  <snm>Merrick</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Glodek</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Scott</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Geoghagen</snm>
                  <fnm>NS</fnm>
               </au>
               <au>
                  <snm>Weidman</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Fuhrmann</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Nguyen</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Utterback</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Kelley</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Peterson</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Sadow</snm>
                  <fnm>PW</fnm>
               </au>
               <au>
                  <snm>Hanna</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Cotton</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Roberts</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Hurst</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Kaine</snm>
                  <fnm>BP</fnm>
               </au>
               <au>
                  <snm>Borodovsky</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Klenk</snm>
                  <fnm>H-P</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>HO</fnm>
               </au>
               <au>
                  <snm>Woese</snm>
                  <fnm>CR</fnm>
               </au>
               <au>
                  <snm>Venter</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1996</pubdate>
            <volume>273</volume>
            <fpage>1058</fpage>
            <lpage>1073</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.273.5278.1058</pubid>
                  <pubid idtype="pmpid" link="fulltext">8688087</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Algal viruses with distinct intraspecies host specificities include identical intein elements</p>
            </title>
            <aug>
               <au>
                  <snm>Nagasaki</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Shirai</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Tomaru</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Nishida</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Pietrokovski</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Appl Environ Microbiol</source>
            <pubdate>2005</pubdate>
            <volume>71</volume>
            <fpage>3599</fpage>
            <lpage>3607</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1169056</pubid>
                  <pubid idtype="pmpid" link="fulltext">16000767</pubid>
                  <pubid idtype="doi">10.1128/AEM.71.7.3599-3607.2005</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Molecular structure of a gene, <it>VMA1</it>, encoding the catalytic subunit of H(+)-translocating adenosine triphosphatase from vacuolar membranes of <it>Saccharomyces cerevisiae</it></p>
            </title>
            <aug>
               <au>
                  <snm>Hirata</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ohsumk</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Nakano</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kawasaki</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Suzuki</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Anraku</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>1990</pubdate>
            <volume>265</volume>
            <fpage>6726</fpage>
            <lpage>6733</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">2139027</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>A nuclear-encoded intein in the fungal pathogen <it>Cryptococcus neoformans</it></p>
            </title>
            <aug>
               <au>
                  <snm>Butler</snm>
                  <fnm>MI</fnm>
               </au>
               <au>
                  <snm>Goodwin</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Poulter</snm>
                  <fnm>RTM</fnm>
               </au>
            </aug>
            <source>Yeast</source>
            <pubdate>2001</pubdate>
            <volume>18</volume>
            <fpage>1365</fpage>
            <lpage>1370</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/yea.781</pubid>
                  <pubid idtype="pmpid" link="fulltext">11746598</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Two new fungal inteins</p>
            </title>
            <aug>
               <au>
                  <snm>Butler</snm>
                  <fnm>MI</fnm>
               </au>
               <au>
                  <snm>Goodwin</snm>
                  <fnm>TJD</fnm>
               </au>
               <au>
                  <snm>Poulter</snm>
                  <fnm>RTM</fnm>
               </au>
            </aug>
            <source>Yeast</source>
            <pubdate>2005</pubdate>
            <volume>22</volume>
            <fpage>493</fpage>
            <lpage>501</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/yea.1229</pubid>
                  <pubid idtype="pmpid" link="fulltext">15849795</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>The PRP8 inteins in <it>Cryptococcus </it>are a source of phylogenetic and epidemiological information</p>
            </title>
            <aug>
               <au>
                  <snm>Butler</snm>
                  <fnm>MI</fnm>
               </au>
               <au>
                  <snm>Poulter</snm>
                  <fnm>RTM</fnm>
               </au>
            </aug>
            <source>Fungal Genet Biol</source>
            <pubdate>2005</pubdate>
            <volume>42</volume>
            <fpage>452</fpage>
            <lpage>463</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.fgb.2005.01.011</pubid>
                  <pubid idtype="pmpid" link="fulltext">15809009</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>The distribution and evolutionary history of the PRP8 intein</p>
            </title>
            <aug>
               <au>
                  <snm>Butler</snm>
                  <fnm>MI</fnm>
               </au>
               <au>
                  <snm>Gray</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Goodwin</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Poulter</snm>
                  <fnm>RTM</fnm>
               </au>
            </aug>
            <source>BMC Evol Biol</source>
            <pubdate>2006</pubdate>
            <volume>6</volume>
            <fpage>42</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1508164</pubid>
                  <pubid idtype="pmpid" link="fulltext">16737526</pubid>
                  <pubid idtype="doi">10.1186/1471-2148-6-42</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Protein splicing of PRP8 mini-inteins from species of the genus <it>Penicillium</it></p>
            </title>
            <aug>
               <au>
                  <snm>Elleuche</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Nolting</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Poggeler</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Appl Microbiol Biotechnol</source>
            <note>2006, Mar 17; [Epub ahead of print]</note>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16544141</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>The plastid genome of the cryptophyte alga, <it>Guillardia theta </it>: complete sequence and conserved synteny groups confirm its common ancestry with red algae</p>
            </title>
            <aug>
               <au>
                  <snm>Douglas</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Penny</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1999</pubdate>
            <volume>48</volume>
            <fpage>236</fpage>
            <lpage>244</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/PL00006462</pubid>
                  <pubid idtype="pmpid" link="fulltext">9929392</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Complete nucleotide sequence of the <it>Porphyra purpurea </it>chloroplast genome</p>
            </title>
            <aug>
               <au>
                  <snm>Reith</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Munholland</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Plant Mol Biol Rep</source>
            <pubdate>1995</pubdate>
            <volume>13</volume>
            <fpage>333</fpage>
            <lpage>335</lpage>
         </bibl>
         <bibl id="B26">
            <title>
               <p>The <it>Chlamydomonas </it>chloroplast <it>clpP </it>gene contains translated large insertion sequences and is essential for cell growth</p>
            </title>
            <aug>
               <au>
                  <snm>Huang</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Lemieux</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Otis</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Turmel</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>X-Q</fnm>
               </au>
            </aug>
            <source>Mol Gen Genet</source>
            <pubdate>1994</pubdate>
            <volume>244</volume>
            <fpage>151</fpage>
            <lpage>159</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF00283516</pubid>
                  <pubid idtype="pmpid">8052234</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Identification of an unusual intein in chloroplast ClpP protease of <it>Chlamydomonas eugametos</it></p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>XQ</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>1997</pubdate>
            <volume>272</volume>
            <fpage>11869</fpage>
            <lpage>11873</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.272.18.11869</pubid>
                  <pubid idtype="pmpid" link="fulltext">9115246</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>A new example of viral intein in Mimivirus</p>
            </title>
            <aug>
               <au>
                  <snm>Ogata</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Raoult</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Claverie</snm>
                  <fnm>J-M</fnm>
               </au>
            </aug>
            <source>Virol J</source>
            <pubdate>2005</pubdate>
            <volume>2</volume>
            <fpage>8</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">549080</pubid>
                  <pubid idtype="pmpid" link="fulltext">15707490</pubid>
                  <pubid idtype="doi">10.1186/1743-422X-2-8</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Identification of a virus intein and a possible variation in the protein-splicing reaction</p>
            </title>
            <aug>
               <au>
                  <snm>Pietrokovski</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Curr Biol</source>
            <pubdate>1998</pubdate>
            <volume>10</volume>
            <fpage>R634</fpage>
            <lpage>635</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/S0960-9822(07)00409-5</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Protein splicing of inteins with atypical glutamine and aspartate C-terminal residues</p>
            </title>
            <aug>
               <au>
                  <snm>Amitai</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Dassa</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Pietrokovski</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>2004</pubdate>
            <volume>279</volume>
            <fpage>3121</fpage>
            <lpage>31</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.M311343200</pubid>
                  <pubid idtype="pmpid" link="fulltext">14593103</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Multisubunit RNA polymerases</p>
            </title>
            <aug>
               <au>
                  <snm>Cramer</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Curr Opin Struc Biol</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>89</fpage>
            <lpage>97</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/S0959-440X(02)00294-4</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Fungal Genome Initiative: Broad Institute</p>
            </title>
            <url>http://www.broad.mit.edu/annotation/fgi</url>
         </bibl>
         <bibl id="B33">
            <title>
               <p>DOE Joint Genome Institute: Genome Portal</p>
            </title>
            <url>http://genome.jgi-psf.org/</url>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Reconstructing the early evolution of Fungi using a six-gene phylogeny</p>
            </title>
            <aug>
               <au>
                  <snm>James</snm>
                  <fnm>TY</fnm>
               </au>
               <au>
                  <snm>Kauff</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Schoch</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>Matheny</snm>
                  <fnm>PB</fnm>
               </au>
               <au>
                  <snm>Hofstetter</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Cox</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Celio</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Gueidan</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Fraker</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Miadlikowska</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lumbsch</snm>
                  <fnm>HT</fnm>
               </au>
               <au>
                  <snm>Rauhut</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Reeb</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Arnold</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Amtoft</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Stajich</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Hosaka</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Sung</snm>
                  <fnm>GH</fnm>
               </au>
               <au>
                  <snm>Johnson</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>O'Rourke</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Crockett</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Binder</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Curtis</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Slot</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Wilson</snm>
                  <fnm>AW</fnm>
               </au>
               <au>
                  <snm>Schussler</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Longcore</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>O'Donnell</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Mozley-Standridge</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Porter</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Letcher</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>Powell</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>MM</fnm>
               </au>
               <au>
                  <snm>Griffith</snm>
                  <fnm>GW</fnm>
               </au>
               <au>
                  <snm>Davies</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>Humber</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Morton</snm>
                  <fnm>JB</fnm>
               </au>
               <au>
                  <snm>Sugiyama</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Rossman</snm>
                  <fnm>AY</fnm>
               </au>
               <au>
                  <snm>Rogers</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Pfister</snm>
                  <fnm>DH</fnm>
               </au>
               <au>
                  <snm>Hewitt</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Hansen</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Hambleton</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Shoemaker</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Kohlmeyer</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Volkmann-Kohlmeyer</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Spotts</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Serdani</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Crous</snm>
                  <fnm>PW</fnm>
               </au>
               <au>
                  <snm>Hughes</snm>
                  <fnm>KW</fnm>
               </au>
               <au>
                  <snm>Matsuura</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Langer</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Langer</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Untereiner</snm>
                  <fnm>WA</fnm>
               </au>
               <au>
                  <snm>Lucking</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Budel</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Geiser</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Aptroot</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Diederich</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Schmitt</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Schultz</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Yahr</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hibbett</snm>
                  <fnm>DS</fnm>
               </au>
               <au>
                  <snm>Lutzoni</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>McLaughlin</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Spatafora</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Vilgalys</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2006</pubdate>
            <volume>443</volume>
            <fpage>818</fpage>
            <lpage>22</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature05110</pubid>
                  <pubid idtype="pmpid" link="fulltext">17051209</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>The genome of the social amoeba Dictyostelium discoideum</p>
            </title>
            <aug>
               <au>
                  <snm>Eichinger</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Pachebat</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Glockner</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Rajandream</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Sucgang</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Berriman</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Song</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Olsen</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Szafranski</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Tunggal</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Kummerfeld</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Madera</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Konfortov</snm>
                  <fnm>BA</fnm>
               </au>
               <au>
                  <snm>Rivero</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Bankier</snm>
                  <fnm>AT</fnm>
               </au>
               <au>
                  <snm>Lehmann</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hamlin</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Davies</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Gaudet</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Fey</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Pilcher</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Saunders</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Sodergren</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kerhornou</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Nie</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Hall</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Anjard</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Hemphill</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Bason</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Farbrother</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Desany</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Just</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Morio</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Rost</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Churcher</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Cooper</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Haydock</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>van Driessche</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Cronin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Goodhead</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Muzny</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Mourier</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Pain</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Harper</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Lindsay</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hauser</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>James</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Quiles</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Madan Babu</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Saito</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Buchrieser</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wardroper</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Felder</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Thangavelu</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Johnson</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Knights</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Loulseged</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Mungall</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Oliver</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Price</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Quail</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Urushihara</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Hernandez</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Rabbinowitsch</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Steffen</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Sanders</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ma</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kohara</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Sharp</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Simmonds</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Spiegler</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Tivey</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sugano</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Walker</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Woodward</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Winckler</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Tanaka</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Shaulsky</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Schleicher</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Weinstock</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Rosenthal</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Cox</snm>
                  <fnm>EC</fnm>
               </au>
               <au>
                  <snm>Chisholm</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Gibbs</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Loomis</snm>
                  <fnm>WF</fnm>
               </au>
               <au>
                  <snm>Platzer</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kay</snm>
                  <fnm>RR</fnm>
               </au>
               <au>
                  <snm>Williams</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Dear</snm>
                  <fnm>PH</fnm>
               </au>
               <au>
                  <snm>Noegel</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Barrell</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Kuspa</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2005</pubdate>
            <volume>435</volume>
            <fpage>43</fpage>
            <lpage>57</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1352341</pubid>
                  <pubid idtype="pmpid" link="fulltext">15875012</pubid>
                  <pubid idtype="doi">10.1038/nature03481</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>A kingdom-level phylogeny of eukaryotes based on combined protein data</p>
            </title>
            <aug>
               <au>
                  <snm>Baldauf</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Roger</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Wenk-Siefert</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Doolittle</snm>
                  <fnm>WF</fnm>
               </au>
            </aug>
            <source>Science</source>
            <volume>290</volume>
            <fpage>972</fpage>
            <lpage>977</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.290.5493.972</pubid>
                  <pubid idtype="pmpid" link="fulltext">11062127</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p><it>Phytophthora </it>genome sequences uncover evolutionary origins and mechanisms of pathogenesis</p>
            </title>
            <aug>
               <au>
                  <snm>Tyler</snm>
                  <fnm>BM</fnm>
               </au>
               <au>
                  <snm>Tripathy</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Dehal</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Jiang</snm>
                  <fnm>RH</fnm>
               </au>
               <au>
                  <snm>Aerts</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Arredondo</snm>
                  <fnm>FD</fnm>
               </au>
               <au>
                  <snm>Baxter</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Bensasson</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Beynon</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Chapman</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Damasceno</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Dorrance</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Dou</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Dickerman</snm>
                  <fnm>AW</fnm>
               </au>
               <au>
                  <snm>Dubchak</snm>
                  <fnm>IL</fnm>
               </au>
               <au>
                  <snm>Garbelotto</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Gijzen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Gordon</snm>
                  <fnm>SG</fnm>
               </au>
               <au>
                  <snm>Govers</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Grunwald</snm>
                  <fnm>NJ</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Ivors</snm>
                  <fnm>KL</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Kamoun</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Krampis</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Lamour</snm>
                  <fnm>KH</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>MK</fnm>
               </au>
               <au>
                  <snm>McDonald</snm>
                  <fnm>WH</fnm>
               </au>
               <au>
                  <snm>Medina</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Meijer</snm>
                  <fnm>HJ</fnm>
               </au>
               <au>
                  <snm>Nordberg</snm>
                  <fnm>EK</fnm>
               </au>
               <au>
                  <snm>Maclean</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Ospina-Giraldo</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Morris</snm>
                  <fnm>PF</fnm>
               </au>
               <au>
                  <snm>Phuntumart</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Putnam</snm>
                  <fnm>NH</fnm>
               </au>
               <au>
                  <snm>Rash</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Rose</snm>
                  <fnm>JK</fnm>
               </au>
               <au>
                  <snm>Sakihama</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Salamov</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Savidor</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Scheuring</snm>
                  <fnm>CF</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>BM</fnm>
               </au>
               <au>
                  <snm>Sobral</snm>
                  <fnm>BW</fnm>
               </au>
               <au>
                  <snm>Terry</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Torto-Alalibo</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Win</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Grigoriev</snm>
                  <fnm>IV</fnm>
               </au>
               <au>
                  <snm>Rokhsar</snm>
                  <fnm>DS</fnm>
               </au>
               <au>
                  <snm>Boore</snm>
                  <fnm>JL</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2006</pubdate>
            <volume>313</volume>
            <fpage>1261</fpage>
            <lpage>6</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1128796</pubid>
                  <pubid idtype="pmpid" link="fulltext">16946064</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Common origin of four diverse families of large eukaryotic DNA viruses</p>
            </title>
            <aug>
               <au>
                  <snm>Iyer</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Aravind</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>J Virol</source>
            <pubdate>2001</pubdate>
            <volume>75</volume>
            <fpage>11720</fpage>
            <lpage>11734</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">114758</pubid>
                  <pubid idtype="pmpid" link="fulltext">11689653</pubid>
                  <pubid idtype="doi">10.1128/JVI.75.23.11720-11734.2001</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Environmental genome shotgun sequencing of the Sargasso Sea</p>
            </title>
            <aug>
               <au>
                  <snm>Venter</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Remington</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Heidelberg</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Halpern</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Rusch</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Paulsen</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>KE</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Fouts</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Levy</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Knap</snm>
                  <fnm>AH</fnm>
               </au>
               <au>
                  <snm>Lomas</snm>
                  <fnm>MW</fnm>
               </au>
               <au>
                  <snm>Nealson</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Peterson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hoffman</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Parsons</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Baden-Tillson</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Pfannkoch</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Rogers</snm>
                  <fnm>YH</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>HO</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>304</volume>
            <fpage>66</fpage>
            <lpage>74</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1093857</pubid>
                  <pubid idtype="pmpid" link="fulltext">15001713</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Basic Local Alignment Search Tool (BLAST)</p>
            </title>
            <aug>
               <au>
                  <cnm>National Center for Biotechnology Information (NCBI)</cnm>
               </au>
            </aug>
            <url>http://www.ncbi.nlm.nih.gov/BLAST/</url>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Complete genome sequence and lytic phase transcription profile of a Coccolithovirus</p>
            </title>
            <aug>
               <au>
                  <snm>Wilson</snm>
                  <fnm>WH</fnm>
               </au>
               <au>
                  <snm>Schroeder</snm>
                  <fnm>DC</fnm>
               </au>
               <au>
                  <snm>Allen</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Holden</snm>
                  <fnm>MT</fnm>
               </au>
               <au>
                  <snm>Parkhill</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Barrell</snm>
                  <fnm>BG</fnm>
               </au>
               <au>
                  <snm>Churcher</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Hamlin</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Mungall</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Norbertczak</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Quail</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Price</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Rabbinowitsch</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Walker</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Craigon</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Roy</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ghazal</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2005</pubdate>
            <volume>309</volume>
            <fpage>1090</fpage>
            <lpage>1092</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1113109</pubid>
                  <pubid idtype="pmpid" link="fulltext">16099989</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Structural basis of transcription: RNA polymerase II at 2.8 &#197;ngstrom resolution</p>
            </title>
            <aug>
               <au>
                  <snm>Cramer</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bushnell</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Kornberg</snm>
                  <fnm>RD</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2001</pubdate>
            <volume>292</volume>
            <fpage>1863</fpage>
            <lpage>1876</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1059493</pubid>
                  <pubid idtype="pmpid" link="fulltext">11313498</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Structural basis of eukaryotic gene transcription</p>
            </title>
            <aug>
               <au>
                  <snm>Boeger</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Bushnell</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Griesenbeck</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lorch</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Strattan</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Westover</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Kornberg</snm>
                  <fnm>RD</fnm>
               </au>
            </aug>
            <source>FEBS Lett</source>
            <pubdate>2005</pubdate>
            <volume>579</volume>
            <fpage>899</fpage>
            <lpage>903</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.febslet.2004.11.027</pubid>
                  <pubid idtype="pmpid" link="fulltext">15680971</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>The Protein Data Bank</p>
            </title>
            <aug>
               <au>
                  <snm>Berman</snm>
                  <fnm>HM</fnm>
               </au>
               <au>
                  <snm>Westbrook</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Feng</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Gilliland</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Bhat</snm>
                  <fnm>TN</fnm>
               </au>
               <au>
                  <snm>Weissig</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Shindyalov</snm>
                  <fnm>IN</fnm>
               </au>
               <au>
                  <snm>Bourne</snm>
                  <fnm>PE</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <fpage>235</fpage>
            <lpage>242</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">102472</pubid>
                  <pubid idtype="pmpid" link="fulltext">10592235</pubid>
                  <pubid idtype="doi">10.1093/nar/28.1.235</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>MacPyMOL</p>
            </title>
            <url>http://pymol.sourceforge.net/</url>
         </bibl>
         <bibl id="B46">
            <title>
               <p>The Wellcome Trust Sanger Institute</p>
            </title>
            <url>http://www.sanger.ac.uk/</url>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Washington University Genome Sequencing Center</p>
            </title>
            <url>http://genome.wustl.edu/</url>
         </bibl>
         <bibl id="B48">
            <title>
               <p>National Center for Biotechnology Information (NCBI)</p>
            </title>
            <url>http://ncbi.nlm.nih.gov</url>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Genetics Computer Group</p>
            </title>
            <source>Program Manual for the Wisconsin Package, Version 8. Madison, Wisconsin</source>
            <pubdate>1994</pubdate>
         </bibl>
         <bibl id="B50">
            <title>
               <p>The CLUSTAL_X Windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools</p>
            </title>
            <aug>
               <au>
                  <snm>Thompson</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Gibson</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Plewniak</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Jeanmougon</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>4876</fpage>
            <lpage>4882</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">147148</pubid>
                  <pubid idtype="pmpid" link="fulltext">9396791</pubid>
                  <pubid idtype="doi">10.1093/nar/25.24.4876</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>SEAVIEW and PHYLO_WIN: two graphic tools for sequence alignment and molecular phylogeny</p>
            </title>
            <aug>
               <au>
                  <snm>Galtier</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Gouy</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Gautier</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Comput Appl Biosci</source>
            <pubdate>1996</pubdate>
            <volume>12</volume>
            <fpage>543</fpage>
            <lpage>548</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9021275</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4</p>
            </title>
            <aug>
               <au>
                  <snm>Swofford</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <publisher>Sunderland, Massachusetts: Sinauer Associates</publisher>
            <pubdate>2002</pubdate>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12504223</pubid>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
