<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1743-422X-2-8</ui>
   <ji>1743-422X</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>A new example of viral intein in Mimivirus</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Ogata</snm>
               <fnm>Hiroyuki</fnm>
               <insr iid="I1"/>
               <email>Hiroyuki.Ogata@igs.cnrs-mrs.fr</email>
            </au>
            <au id="A2">
               <snm>Raoult</snm>
               <fnm>Didier</fnm>
               <insr iid="I2"/>
               <email>Didier.Raoult@medecine.univ-mrs.fr</email>
            </au>
            <au id="A3">
               <snm>Claverie</snm>
               <fnm>Jean-Michel</fnm>
               <insr iid="I1"/>
               <email>Jean-Michel.Claverie@igs.cnrs-mrs.fr</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Information G&#233;nomique et Structurale, UPR2589 CNRS, IBSM, IFR88, 31 chemin Joseph Aiguier, 13402 Marseille Cedex 20, France</p>
            </ins>
            <ins id="I2">
               <p>Unit&#233; des Rickettsies, CNRS UPRESA 6020, Facult&#233; de M&#233;decine, 27 Boulevard Jean Moulin, 13385 Marseille Cedex 05, France</p>
            </ins>
         </insg>
         <source>Virology Journal</source>
         <issn>1743-422X</issn>
         <pubdate>2005</pubdate>
         <volume>2</volume>
         <issue>1</issue>
         <fpage>8</fpage>
         <url>http://www.virologyj.com/content/2/1/8</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">15707490</pubid>
               <pubid idtype="doi">10.1186/1743-422X-2-8</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>10</day>
               <month>1</month>
               <year>2005</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>11</day>
               <month>2</month>
               <year>2005</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>11</day>
               <month>2</month>
               <year>2005</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2005</year>
         <collab>Ogata et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Inteins are "protein introns" that remove themselves from their host proteins through an autocatalytic protein-splicing. After their discovery, inteins have been quickly identified in all domains of life, but only once to date in the genome of a eukaryote-infecting virus.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Here we report the identification and bioinformatics characterization of an intein in the DNA polymerase PolB gene of amoeba infecting Mimivirus, the largest known double-stranded DNA virus, the origin of which has been proposed to predate the emergence of eukaryotes. Mimivirus intein exhibits canonical sequence motifs and clearly belongs to a subclass of archaeal inteins always found in the same location of PolB genes. On the other hand, the Mimivirus PolB is most similar to eukaryotic Pol&#948; sequences.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>The intriguing association of an extremophilic archaeal-type intein with a mesophilic eukaryotic-like PolB in Mimivirus is consistent with the hypothesis that DNA viruses might have been the central reservoir of inteins throughout the course of evolution.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Mimivirus is the largest known virus, both in particle size (>0.4 &#956;m in diameter) and genome length, recently discovered in amoeba, following the inspection of a hospital cooling tower prompted by a pneumonia outbreak <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Recently, its entire 1.2-Mbp genome sequence was determined <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. Extensive phylogenetic studies and gene content analyses defined Mimivirus as a new family of nucleocytoplasmic large DNA viruses (NCLDV) besides <it>Poxviridae</it>, <it>Iridoviridae</it>, <it>Phycodnaviridae </it>and <it>Asfarviridae</it>, and suggested its early origin, probably before the individualization of the three domains of life <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>.</p>
         <p>While analyzing Mimivirus genome sequence, we noticed the unusual length of its putative DNA polymerase. A detailed analysis identified an intein in this gene. After the recent discovery of an intein in <it>Chilo </it>iridescent virus <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, an insect-infecting NCLDV of <it>Iridoviridae</it>, this is the second report of an intein sequence in a eukaryote-infecting virus.</p>
         <p>Inteins are "protein introns" that catalyze self-splicing at the protein level. The splicing is defined by the self-catalytic excision of an intervening sequence ("intein") from a precursor host protein where it is located, and the concomitant ligation of the flanking amino- and carboxy-terminal fragments ("exteins") of the precursor. Inteins often possess a homing endonuclease domain, and are considered as mobile elements. Since their first discovery in 1990 <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>, inteins have been identified in a wide variety of organisms, including bacteria, archaea, and unicellular eukaryotes, albeit with sporadic distribution (see <url>http://bioinformatics.weizmann.ac.il/~pietro/inteins/</url> for a comprehensive list). For instance, they are relatively abundant in some hyperthermophilic archaea species (such as <it>Methanococcus jannaschii </it>possessing nineteen inteins), but absent in closely related species such as <it>Methanococcus maripaludis </it><abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Similarly, they are observed in many unrelated bacterial clades, but appear often limited to several species within each clade. It was suggested that viruses were potential "vectors" of inteins across species and responsible for the sporadic distribution of inteins <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. Accordingly, inteins have been identified in many bacteriophages and prophages <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. To our knowledge, the sole published account of eukaryote-infecting viruses harboring an intein concerns iridoviruses <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Eukaryotic Pol&#948;-like Mimivirus PolB</p>
            </st>
            <p>Mimivirus genome sequence exhibits a putative ORF (R322, 1740 amino acid long) corresponding to a family B DNA polymerase PolB. This ORF R322 exhibits high scoring sequence homology (BLAST E-value&lt;10<sup>-24</sup>) against eukaryotic PolBs in the public database. However, this Mimivirus PolB is much larger than its eukaryotic and viral homologues (about 1000 aa), and its optimal alignment with the other PolB sequences reveals four unmatched extraneous segments (Fig. <figr fid="F1">1A</figr>, Fig. <supplr sid="S1"> S1</supplr>). Focusing on these extra segments, we identified a 351-aa intein (position 1053 to 1403) in the Mimivirus PolB sequence.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p><b>(A) </b>Locations of inteins found in different DNA polymerases of the family B (PolB) (I, II, III; filled triangles) and other extra segments identified in the Mimivirus PolB (i1, i2, i3; open triangles)</p>
               </caption>
               <text>
                  <p><b>(A) </b>Locations of inteins found in different DNA polymerases of the family B (PolB) (I, II, III; filled triangles) and other extra segments identified in the Mimivirus PolB (i1, i2, i3; open triangles). <it>Nanoarchaeum equitans </it>PolI is encoded in two pieces of genes (NEQ068, NEQ528), the break point of which corresponds to the position III intein integration site. Full intein motifs are comprised of the C-terminal part of NEQ068 and N-terminal part of NEQ528. <b>(B) </b>A phylogenetic tree of the family B DNA polymerases (PolBs) from diverse organisms, including Mimivirus (R322; GenBank AY653733), Paramecium bursaria Chlorella virus 1 (PBCV), Ectocarpus siliculosus virus (ESV), Invertebrate iridescent virus 6 (IIV), Lymphocystis disease virus 1 (LDV), Amsacta moorei entomopoxvirus (AME), Variola virus, Asfarvirus, eukaryotic DNA polymerase &#945; and &#948; catalytic subunits, and archaeal DNA polymerase I. Intein containing genes are indicated by bold letters in the figure. Numbers in parentheses on the right of species name designate the numbering of paralogs. Sequences corresponding to inteins or Mimivirus extra segments (i1, i2, i3) were removed for the tree reconstruction. <it>N. equitans </it>PolI split genes were concatenated. <b>(C) </b>A phylogenetic tree based on the intein sequences found in PolBs. Numbers (I, II, and III) in parentheses on the right of species names indicate the intein integration sites. In (B) and (C), trees were built using a neighbor joining method, and rooted by the mid-point method. Bootstrap values larger than 70% are indicated along the branches.</p>
               </text>
               <graphic file="1743-422X-2-8-1"/>
            </fig>
            <suppl id="S1">
               <title>
                  <p>Additional File 1</p>
               </title>
               <text>
                  <p><b>Supplementary figure S1 </b>Sequence alignment of Mimivirus PolB and eukaryotic Pol&#948;s. The Mimivirus intein sequence is removed, and its insertion site is highlighted by amino acid residues in red corresponding to the left three and right three resides around the insertion site. Three Mimivirus specific inserts (i1, i2 i3) were highlighted by blue letters. Conserved carboxylate residues in the exonuclease and polymerase active sites are highlighted by green background. Eukaryotic sequences were <it>Encephalitozoon cuniculi </it>(TrEMBL/SWISS-PROT: Q8SQP5), <it>Schizosaccharomyces pombe </it>(P30316) and <it>Glycine max </it>(soybean, O48901). Sequence alignment was obtained with the use of T-Coffee.</p>
               </text>
               <file name="1743-422X-2-8-S1.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>After removing those four Mimivirus specific insertions, the Mimivirus PolB sequence exhibited the highest BLAST scores (E-value = 10<sup>-125</sup>, 32% identity) against a soybean DNA polymerase Pol&#948; (SWISS-PROT: O48901) with an alignment covering both the entire Mimivirus and the target sequence. Near equivalent matches are observed with a variety of eukaryotic (from yeast to human) family B DNA polymerase sequences. The best viral homologues were found in phycodnaviruses (E-value = 10<sup>-116</sup>). Conserved carboxylate residues (aspartate and glutamate) at the exonuclease and polymerase active sites <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp> were all identified in the Mimivirus PolB (Fig. <supplr sid="S1"> S1</supplr>). There was no other ORF encoding a putative PolB in the genome. These suggest that R322 encodes a functional PolB. Consistent with the homology search result, a phylogenetic analysis places the Mimivirus PolB near the root of eukaryotic Pol&#948;s (Fig. <figr fid="F1">1B</figr>). A similar branching position is obtained for the seven universally conserved Mimivirus genes <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. Despite low bootstrap values for some of the deep branches in the Fig. <figr fid="F1">1B</figr>, this tree clearly indicates the lack of any specific affinity between the Mimivirus PolB and the archaeal PolB sequences containing inteins (bold letters in the Fig. <figr fid="F1">1B</figr>). It should also be noted that several other large DNA viruses are known to possess PolBs with a similar phylogenetic pattern <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Canonical/archaeal type Mimivirus intein</p>
            </st>
            <p>The Mimivirus intein sequence (351 aa) exhibits significant sequence similarities to several known inteins (E-value&lt;10<sup>-4</sup>), all of which are from thermophilic/halophilic archaea. The best matching intein (E-value = 3 &#215; 10<sup>-8</sup>) is the second intein of the <it>Thermococcus sp</it>. PolB (InBase: Tsp-GE8 Pol-2) with 24% amino acid sequence identity. The Mimivirus sequence exhibits all the expected features required for an active intein (Fig. <figr fid="F2">2</figr>). Sequence motifs <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> characterizing the splicing domain (N1-4, C2, C1) and the dodecapeptide LAGLIDADG homing-endonuclease domain (EN1-4) were all identified in the Mimivirus sequence except N4 motif. N4 motif is occasionally absent in the previously characterized active inteins <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. Amino acid residues providing nucleophilic groups in self-splicing reactions are all present: the first serine and the last asparagine residues of the intein, and the first threonine residue of the downstream extein. Accordingly the Mimivirus intein is a canonical "asparagine-type" intein, of which the close homologues have previously been observed only in archaea species. In contrast, the previously reported <it>Chilo </it>iridescent virus intein is a non-canonical "glutamine-type" exhibiting a glutamine residue at the C-terminus <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B15">15</abbr></abbrgrp>. The threonine and histidine residues in the N3 motif assisting in the initial acyl rearrangement at the N-terminal splice junction are also conserved. Thus, we predict that the Mimivirus intein is an active intein capable of self-splicing. The presence of a homing endonuclease domain suggests that this intein also retained its capacity to spread to other sites of the genome or to other organisms.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>The Mimivirus DNA polymerase PolB intein</p>
               </caption>
               <text>
                  <p>The Mimivirus DNA polymerase PolB intein. The 351 amino acid residues intein sequence is shown with, respectively, the last and the first three amino acid residues of the N-extein and the C-extein. Bold letters represent amino acid residues essential for protein splicing. Conserved intein sequence motifs are indicated by underlines (N1, N2, N3, EN1, EN2, EN3, EN4, C2 and C1). The sequence part matching to the Pfam LAGLIDADG endonuclease domain (PF00961, E-value = 0.16) is indicated by italic letters. The intein/extein boundaries are shown by '|'.</p>
               </text>
               <graphic file="1743-422X-2-8-2"/>
            </fig>
            <p>Other three inserts that we identified in the Mimivirus PolB are rather short. Those inserts are unique to Mimivirus, being not found in other PolB sequences. One of the extra segments of 197 aa found at the position 'i3' (Fig. <figr fid="F1">1A</figr>) exhibits a marginal sequence similarity to an intein within the replication factor C of <it>Methanococcus jannaschii </it>(E-value = 0.002, Fig. <supplr sid="S2">S2</supplr>). However, it also exhibits a comparable level of sequence similarities to several unrelated database sequences, apparently containing low complexity sequences. The i3-insert lacks sequence features required for an active intein. The remaining two extra segments (88 and 121 aa at the position 'i1' and 'i2', respectively) did not exhibit any significant similarity to known protein sequences. The biological properties of those three Mimivirus specific inserts remain to be characterized.</p>
            <suppl id="S2">
               <title>
                  <p>Additional File 2</p>
               </title>
               <text>
                  <p><b>Supplementary figure S2 </b>Sequence alignment of Mimivirus insert i3 and known intein sequences. Intein sequences are from <it>Methanococcus jannaschii </it>replication factor C (Mja RFC-3) and <it>Pyrococcus abyssi </it>replication factor C (Pab RFC-2).</p>
               </text>
               <file name="1743-422X-2-8-S2.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Mimivirus intein belongs to a specific allele type</p>
            </st>
            <p>Inteins have been identified in different types of DNA polymerases <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. DNA polymerase catalytic subunits known to contain inteins are archaeal PolI, archaeal DNA polymerase II (PolII), bacterial DNA polymerase III &#945; subunit (DnaE) and bacteriophage DNA polymerase I. Among these, archaeal PolI belongs to the family B DNA polymerase. Archaeal PolI contains up to three intein alleles, the insertion of which always occurs at one of three strictly conserved positions (I, II and III in Fig. <figr fid="F1">1A</figr>). Interestingly, the location of the bipartite inteins that separate the two PolI gene pieces of <it>Nanoarchaeum equitans </it><abbrgrp><abbr bid="B17">17</abbr></abbrgrp> coincides with position III. Remarkably, Mimivirus intein is exactly located at the position III (Fig. <figr fid="F1">1A</figr>). The sequence around the insertion site is highly conserved among different PolBs from evolutionary distant organisms such as <it>Escherichia coli </it>and human (Fig. <figr fid="F3">3</figr>). The crystal structure of <it>Pyrococcus kodakaraensis </it>PolI <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> reveals that those three distinct sites are in close spatial proximity, in the middle of the DNA binding domain and active site.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Sequence alignment of Family B DNA polymerases from the Archaea, Bacteria and Eukarya domains</p>
               </caption>
               <text>
                  <p>Sequence alignment of Family B DNA polymerases from the Archaea, Bacteria and Eukarya domains. The Mimivirus PolB sequence was used without its intein sequence. Only the region of the alignment around Mimivirus intein insertion site ("YGD|TDS") is shown. The insertion site precisely coincides with the most conserved positions in the sequences, as indicated by bold letters. This is the sole region in the entire sequence exhibiting 6 consecutive identical residues among PolB of the Archaea, Bacteria and Eukarya domains. SWISS-PROT/TrEMBL IDs are DPOL_ARCFU (<it>Archaeoglobus fulgidus</it>), Q8TWJ5 (<it>Methanopyrus kandleri</it>), DPO2_ECOLI (<it>Escherichia coli</it>), Q87NC2 (<it>Vibrio parahaemolyticus</it>), Q8SQP5 (<it>Encephalitozoon cuniculi</it>), and DPOD_HUMAN (Human).</p>
               </text>
               <graphic file="1743-422X-2-8-3"/>
            </fig>
            <p>Perler <it>et al</it>. observed that inteins present in the same location within homologous genes ("intein alleles") tend to be more similar with each other than with inteins in different locations of the same gene or in different genes <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. This phenomenon appears not only the simple consequence of regular vertical transmission of inteins, but also the result of lateral acquisitions through "homing" <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> at the same site of highly similar genes (i.e. "alleles") by the mechanism involving gene conversion <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. Remarkably, the Mimivirus PolB intein holds this rule. The Mimivirus intein exhibits higher sequence homology scores to inteins at the position III of archaeal PolI (designated as "pol-c allele") than to inteins in the other PolI locations (I, II) or inteins in other genes. A phylogenetic analysis of the Mimivirus intein and other PolI inteins also supports the classification of the Mimivirus intein in this specific "intein allele"-type (Fig. <figr fid="F1">1C</figr>). This underlines the presence of intein subclasses ("intein alleles") each exhibiting its own preference of harboring site, even in such distantly related homologous genes such as Mimivirus PolB and archaeal PolI. It is implausible that the intein homing mechanism involving gene conversion have led to the direct transfer of an intein between such distantly related homologous genes. Nucleotide sequences (18 bp) around the pol-c allele insertion site do not exhibit unexpectedly high level of sequence similarities between Mimivirus (TATGGAGAC/ACGGACTCA for the amino acid sequence YGD/TDS) and archaeal sequences. For instance, the sequences from <it>M. jannaschii </it>and <it>Pyrococcus horikoshii </it>exhibit 7-missmaches (TAT<ul>ATT</ul>GAC/AC<ul>T</ul>GA<ul>TGG</ul>A; MJ0885) and 5 mismatches (TAT<ul>AT</ul>AGAC/ACGGA<ul>TGG</ul>A; PH1947), respectively. To the best of our knowledge, no evidence has been reported for a homing endonuclease recognizing such different sequences, although homing endonucleases are known to be rather tolerant of single-base-pair changes in their lengthy DNA recognition sequences <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. A similar observation has been reported for DnaB inteins of <it>Rhodothermus marinus </it>and <it>Synechocystis </it>sp. PCC6803 <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>.</p>
            <p>A shift in the base compositions between intein and extein coding sequences is considered as indicating a recent acquisition of inteins <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. Mimivirus PolB extein/intein DNA sequence compositions do not show a significant difference. Both exhibit similar G+C-contents (29%) and codon usages. In contrast, <it>Thermococcus fumicolans </it>PolI coding DNA (GenBank: Z69882) exhibits a G+C-content of 57% for the extein regions, compared to G+C-contents of 47% and 49% for its two inteins.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>Archaeal PolI inteins have been described only in extremophiles, growing under conditions of temperature over 80&#176;C (hyperthermophiles) or of high salinity (10 times that of sea water; halophiles). Mimivirus is mesophilic, growing in amoeba under the temprature of 37&#176;C. The association of an archaeal-seqeunce-like intein with a eukaryotic-like PolB in Mimivirus thus suggests an indirect interaction between mesophilic eukaryotic viruses and extremophilic archaeabacteria. Mesophilic euryarchaea species similar to the methanogens associated with rumen <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp> or related species found in human beings <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> might have mediated the transition of inteins between extreme environment and moderate one in the course of evolution. However, no data are available yet on the presence of inteins in the PolB genes of such mesophilic archaebacteria.</p>
         <p>Lateral transfer (homing) might be responsible for the phylogenetic incongruence between inteins and exteins, and the same intein locations within homologues of distantly related organisms such as Mimivirus and archaea. However, given the specificity of homing endonucleases to long recognition sequences (12&#8211;40 bp) and the low level DNA sequence similarity between viral and archeal PolB homologues, a single recent homing event appears quite unlikely. The spread of inteins is better explained by a series of transfers, where inteins progressively accommodated small changes in their homing recognition sequences while retaining their gene position specificity. Such a cascade of transfers could have been mediated by DNA viruses <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. Consistent results now start to accumulate including recent identification of several inteins in different iridoviruses (S. Pietrokovski pers. comm.), and an intein in a golden brown alga-infecting virus HaV of the <it>Phycodnaviridae </it><abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. Given the similar base compositions of Mimivirus intein and extein, the low level of intein homology between Mimivirus and archaea, and the likely early origin of the Mimivirus/NCLDV lineage <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, it is tempting to speculate that these DNA viruses might have acquired inteins very early on, and acted as their central reservoir disseminating inteins across different domains of life in the long course of evolution.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>We have characterized a new viral intein found in the eukaryotic-type putative DNA polymerase PolB of Mimivirus by binformatics methods. The conservation of the active site motifs for splicing as well as its insertion at a catalytically important site of the PolB sequence suggests that the intein is most likely to be functional. Our phylogenetic analyses revealed that the intein sequence is closest to extremophilic archaeal inteins. The intriguing association of an extremophilic archaeal-type intein with a mesophilic eukaryotic-like PolB in Mimivirus is consistent with the hypothesis that DNA viruses might have been the central reservoir of inteins throughout the course of evolution.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <p>Sequence homology searches were carried out with the use of the BLAST programs <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> against the SWISS-PROT/TrEMBL database <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> and the New England Biolabs Intein Database [InBase, <url>http://www.neb.com/neb/inteins.html</url>; [Perler, 2002 #1380]]. Pfam <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> searches were carried out with the use of its web site <url>http://www.sanger.ac.uk/Software/Pfam/</url>. Multiple sequence alignments were generated with the use of T-Coffee <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. Intein sequence motifs were identified through the inspection of a multiple intein sequence alignment. Neighbor joining tree analyses were conducted with the use of MEGA version 2.1 <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. All the gap containing columns in multiple sequence alignments were removed before phylogenetic tree analyses. The gamma distance was applied to compute evolutionary distances. The gamma shape parameter (alpha) was estimated using the GZ-GAMMA program <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>.</p>
         <p>The sequence and annotation data for the Mimivirus PolB and intein was deposited to GenBank (accession number: AY606804). The complete genome sequence of Mimivirus is also available at GenBank (accession number: NC_006450). For a comprehensive description of the Mimivirus complete genome sequence and preliminary characterizations of the viral particle, see <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>.</p>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The author(s) declare that they have no competing interests.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contribution</p>
         </st>
         <p>HO carried out most of the sequence analysis, contributed to the interpretation of the results, and drafted the manuscript. DR contributed to the interpretation of the results. JMC contributed to the construction of the sequence alignment, participated in the interpretation of the results and finalized the manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The authors wish to thank Dr. Shmuel Pietrokovski for his precious comments, Dr. Keizo Nagasaki for the information about their recent finding of a HaV intein, and Dr. Deborah Burn and Dr. Guillaume Blanc for their critical reading of the manuscript.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>A giant virus in amoebae</p>
            </title>
            <aug>
               <au>
                  <snm>La Scola</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Audic</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Robert</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Jungang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>de Lamballerie</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Drancourt</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Birtles</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Claverie</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Raoult</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>299</volume>
            <fpage>2033</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1081867</pubid>
                  <pubid idtype="pmpid" link="fulltext">12663918</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>The 1.2-megabase genome sequence of Mimivirus</p>
            </title>
            <aug>
               <au>
                  <snm>Raoult</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Audic</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Robert</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Abergel</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Renesto</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Ogata</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>La Scola</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Suzan</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Claverie</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>306</volume>
            <fpage>1344</fpage>
            <lpage>1350</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1101485</pubid>
                  <pubid idtype="pmpid" link="fulltext">15486256</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Identification of a virus intein and a possible variation in the protein-splicing reaction</p>
            </title>
            <aug>
               <au>
                  <snm>Pietrokovski</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Curr Biol</source>
            <pubdate>1998</pubdate>
            <volume>8</volume>
            <fpage>R634</fpage>
            <lpage>5</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0960-9822(07)00409-5</pubid>
                  <pubid idtype="pmpid">9740808</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Molecular structure of a gene, VMA1, encoding the catalytic subunit of H(+)-translocating adenosine triphosphatase from vacuolar membranes of Saccharomyces cerevisiae</p>
            </title>
            <aug>
               <au>
                  <snm>Hirata</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ohsumk</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Nakano</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kawasaki</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Suzuki</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Anraku</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>1990</pubdate>
            <volume>265</volume>
            <fpage>6726</fpage>
            <lpage>6733</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">2139027</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Protein splicing converts the yeast TFP1 gene product to the 69-kD subunit of the vacuolar H(+)-adenosine triphosphatase</p>
            </title>
            <aug>
               <au>
                  <snm>Kane</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>Yamashiro</snm>
                  <fnm>CT</fnm>
               </au>
               <au>
                  <snm>Wolczyk</snm>
                  <fnm>DF</fnm>
               </au>
               <au>
                  <snm>Neff</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Goebl</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Stevens</snm>
                  <fnm>TH</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1990</pubdate>
            <volume>250</volume>
            <fpage>651</fpage>
            <lpage>657</lpage>
            <xrefbib>
               <pubid idtype="pmpid">2146742</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Complete genome sequence of the genetically tractable hydrogenotrophic methanogen Methanococcus maripaludis</p>
            </title>
            <aug>
               <au>
                  <snm>Hendrickson</snm>
                  <fnm>EL</fnm>
               </au>
               <au>
                  <snm>Kaul</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Bovee</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Chapman</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Chung</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Conway de Macario</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Dodsworth</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Gillett</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Graham</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Hackett</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Haydock</snm>
                  <fnm>AK</fnm>
               </au>
               <au>
                  <snm>Kang</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Land</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Levy</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Lie</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Major</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Moore</snm>
                  <fnm>BC</fnm>
               </au>
               <au>
                  <snm>Porat</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Palmeiri</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Rouse</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Saenphimmachak</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Soll</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Van Dien</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Whitman</snm>
                  <fnm>WB</fnm>
               </au>
               <au>
                  <snm>Xia</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Larimer</snm>
                  <fnm>FW</fnm>
               </au>
               <au>
                  <snm>Olson</snm>
                  <fnm>MV</fnm>
               </au>
               <au>
                  <snm>Leigh</snm>
                  <fnm>JA</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>2004</pubdate>
            <volume>186</volume>
            <fpage>6956</fpage>
            <lpage>6969</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">522202</pubid>
                  <pubid idtype="pmpid" link="fulltext">15466049</pubid>
                  <pubid idtype="doi">10.1128/JB.186.20.6956-6969.2004</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Isolation and characterization of APSE-1, a bacteriophage infecting the secondary endosymbiont of Acyrthosiphon pisum</p>
            </title>
            <aug>
               <au>
                  <snm>van der Wilk</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Dullemans</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Verbeek</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>van den Heuvel</snm>
                  <fnm>JF</fnm>
               </au>
            </aug>
            <source>Virology</source>
            <pubdate>1999</pubdate>
            <volume>262</volume>
            <fpage>104</fpage>
            <lpage>113</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/viro.1999.9902</pubid>
                  <pubid idtype="pmpid" link="fulltext">10489345</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Ribonucleotide reductase genes of Bacillus prophages: a refuge to introns and intein coding sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Lazarevic</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <fpage>3212</fpage>
            <lpage>3218</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">55833</pubid>
                  <pubid idtype="pmpid" link="fulltext">11470879</pubid>
                  <pubid idtype="doi">10.1093/nar/29.15.3212</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Origins of highly mosaic mycobacteriophage genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Pedulla</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Ford</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Houtz</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Karthikeyan</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Wadsworth</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lewis</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Jacobs-Sera</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Falbo</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gross</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Pannunzio</snm>
                  <fnm>NR</fnm>
               </au>
               <au>
                  <snm>Brucker</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Kumar</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Kandasamy</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Keenan</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Bardarov</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kriakov</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lawrence</snm>
                  <fnm>JG</fnm>
               </au>
               <au>
                  <snm>Jacobs</snm>
                  <fnm>WRJ</fnm>
               </au>
               <au>
                  <snm>Hendrix</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Hatfull</snm>
                  <fnm>GF</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2003</pubdate>
            <volume>113</volume>
            <fpage>171</fpage>
            <lpage>182</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0092-8674(03)00233-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">12705866</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Genomic Insights into Methanotrophy: The Complete Genome Sequence of Methylococcus capsulatus (Bath)</p>
            </title>
            <aug>
               <au>
                  <snm>Ward</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Larsen</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Sakwa</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Bruseth</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Khouri</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Durkin</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Dimitrov</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Jiang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Scanlan</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kang</snm>
                  <fnm>KH</fnm>
               </au>
               <au>
                  <snm>Lewis</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>KE</fnm>
               </au>
               <au>
                  <snm>Methe</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Heidelberg</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Paulsen</snm>
                  <fnm>IT</fnm>
               </au>
               <au>
                  <snm>Fouts</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ravel</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Tettelin</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ren</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Read</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Deboy</snm>
                  <fnm>RT</fnm>
               </au>
               <au>
                  <snm>Seshadri</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Jensen</snm>
                  <fnm>HB</fnm>
               </au>
               <au>
                  <snm>Birkeland</snm>
                  <fnm>NK</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>WC</fnm>
               </au>
               <au>
                  <snm>Dodson</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Grindhaug</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Holt</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Eidhammer</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Jonasen</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Vanaken</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Utterback</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Feldblyum</snm>
                  <fnm>TV</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Lillehaug</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>JA</fnm>
               </au>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2004</pubdate>
            <volume>2</volume>
            <fpage>e303</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">517821</pubid>
                  <pubid idtype="pmpid" link="fulltext">15383840</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Crystal structure of DNA polymerase from hyperthermophilic archaeon Pyrococcus kodakaraensis KOD1</p>
            </title>
            <aug>
               <au>
                  <snm>Hashimoto</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Nishioka</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Fujiwara</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Takagi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Imanaka</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Inoue</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Kai</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2001</pubdate>
            <volume>306</volume>
            <fpage>469</fpage>
            <lpage>477</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2000.4403</pubid>
                  <pubid idtype="pmpid" link="fulltext">11178906</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Crystal structure of a bacteriophage T7 DNA replication complex at 2.2 A resolution</p>
            </title>
            <aug>
               <au>
                  <snm>Doublie</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Tabor</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Long</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Richardson</snm>
                  <fnm>CC</fnm>
               </au>
               <au>
                  <snm>Ellenberger</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1998</pubdate>
            <volume>391</volume>
            <fpage>251</fpage>
            <lpage>258</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/34593</pubid>
                  <pubid idtype="pmpid" link="fulltext">9440688</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>A hypothesis for DNA viruses as the origin of eukaryotic replication proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Villarreal</snm>
                  <fnm>LP</fnm>
               </au>
               <au>
                  <snm>DeFilippis</snm>
                  <fnm>VR</fnm>
               </au>
            </aug>
            <source>J Virol</source>
            <pubdate>2000</pubdate>
            <volume>74</volume>
            <fpage>7079</fpage>
            <lpage>7084</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">112226</pubid>
                  <pubid idtype="pmpid" link="fulltext">10888648</pubid>
                  <pubid idtype="doi">10.1128/JVI.74.15.7079-7084.2000</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Modular organization of inteins and C-terminal autocatalytic domains</p>
            </title>
            <aug>
               <au>
                  <snm>Pietrokovski</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Protein Sci</source>
            <pubdate>1998</pubdate>
            <volume>7</volume>
            <fpage>64</fpage>
            <lpage>71</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9514260</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Protein splicing of inteins with atypical glutamine and aspartate C-terminal residues</p>
            </title>
            <aug>
               <au>
                  <snm>Amitai</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Dassa</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Pietrokovski</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>2004</pubdate>
            <volume>279</volume>
            <fpage>3121</fpage>
            <lpage>3131</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.M311343200</pubid>
                  <pubid idtype="pmpid" link="fulltext">14593103</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>InBase: the Intein Database</p>
            </title>
            <aug>
               <au>
                  <snm>Perler</snm>
                  <fnm>FB</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>383</fpage>
            <lpage>384</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">99080</pubid>
                  <pubid idtype="pmpid" link="fulltext">11752343</pubid>
                  <pubid idtype="doi">10.1093/nar/30.1.383</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>The genome of Nanoarchaeum equitans: insights into early archaeal evolution and derived parasitism</p>
            </title>
            <aug>
               <au>
                  <snm>Waters</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Hohn</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Ahel</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Graham</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Adams</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Barnstead</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Beeson</snm>
                  <fnm>KY</fnm>
               </au>
               <au>
                  <snm>Bibbs</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Bolanos</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Keller</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kretz</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Mathur</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Ni</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Podar</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Richardson</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>GG</fnm>
               </au>
               <au>
                  <snm>Simon</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Soll</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Stetter</snm>
                  <fnm>KO</fnm>
               </au>
               <au>
                  <snm>Short</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Noordewier</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>2003</pubdate>
            <volume>100</volume>
            <fpage>12984</fpage>
            <lpage>8. Epub 2003 Oct 17.</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">240731</pubid>
                  <pubid idtype="pmpid" link="fulltext">14566062</pubid>
                  <pubid idtype="doi">10.1073/pnas.1735403100</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Compilation and analysis of intein sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Perler</snm>
                  <fnm>FB</fnm>
               </au>
               <au>
                  <snm>Olsen</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Adam</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>1087</fpage>
            <lpage>1093</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146560</pubid>
                  <pubid idtype="pmpid" link="fulltext">9092614</pubid>
                  <pubid idtype="doi">10.1093/nar/25.6.1087</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Homing endonucleases: keeping the house in order</p>
            </title>
            <aug>
               <au>
                  <snm>Belfort</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Roberts</snm>
                  <fnm>RJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>3379</fpage>
            <lpage>3388</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146926</pubid>
                  <pubid idtype="pmpid" link="fulltext">9254693</pubid>
                  <pubid idtype="doi">10.1093/nar/25.17.3379</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>A DnaB intein in Rhodothermus marinus: indication of recent intein homing across remotely related organisms</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>XQ</fnm>
               </au>
               <au>
                  <snm>Hu</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>1997</pubdate>
            <volume>94</volume>
            <fpage>7851</fpage>
            <lpage>7856</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">21518</pubid>
                  <pubid idtype="pmpid" link="fulltext">9223276</pubid>
                  <pubid idtype="doi">10.1073/pnas.94.15.7851</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Phylogenetic analysis of archaeal 16S rRNA libraries from the rumen suggests the existence of a novel group of archaea not associated with known methanogens</p>
            </title>
            <aug>
               <au>
                  <snm>Tajima</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Nagamine</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Matsui</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Nakamura</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Aminov</snm>
                  <fnm>RI</fnm>
               </au>
            </aug>
            <source>FEMS Microbiol Lett</source>
            <pubdate>2001</pubdate>
            <volume>200</volume>
            <fpage>67</fpage>
            <lpage>72</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0378-1097(01)00201-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">11410351</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Phylogenetic analysis of methanogens from the bovine rumen</p>
            </title>
            <aug>
               <au>
                  <snm>Whitford</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Teather</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Forster</snm>
                  <fnm>RJ</fnm>
               </au>
            </aug>
            <source>BMC Microbiol</source>
            <pubdate>2001</pubdate>
            <volume>1</volume>
            <fpage>5. Epub 2001 May 16.</fpage>
            <xrefbib>
               <pubid idtype="doi">10.1186/1471-2180-1-5</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Identification of archaeal rDNA from subgingival dental plaque by PCR amplification and sequence analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Kulik</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Sandmeier</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Hinni</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Meyer</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>FEMS Microbiol Lett</source>
            <pubdate>2001</pubdate>
            <volume>196</volume>
            <fpage>129</fpage>
            <lpage>133</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0378-1097(01)00051-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">11267768</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Algal viruses with distinct intraspecies host specificities include identical intein elements.</p>
            </title>
            <aug>
               <au>
                  <snm>Nagasaki</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Shirai</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Tomaru</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Nishida</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Pietrokovski</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Appl Environ Microbiol</source>
            <pubdate>2005</pubdate>
            <volume>(in press)</volume>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Schaffer</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>3389</fpage>
            <lpage>3402</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146917</pubid>
                  <pubid idtype="pmpid" link="fulltext">9254694</pubid>
                  <pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003</p>
            </title>
            <aug>
               <au>
                  <snm>Boeckmann</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Blatter</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Estreicher</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gasteiger</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Martin</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Michoud</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>O'Donovan</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Phan</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Pilbout</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Schneider</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>365</fpage>
            <lpage>370</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">165542</pubid>
                  <pubid idtype="pmpid" link="fulltext">12520024</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg095</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>The Pfam protein families database</p>
            </title>
            <aug>
               <au>
                  <snm>Bateman</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Cerruti</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Etwiller</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Griffiths-Jones</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Howe</snm>
                  <fnm>KL</fnm>
               </au>
               <au>
                  <snm>Marshall</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sonnhammer</snm>
                  <fnm>EL</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>276</fpage>
            <lpage>280</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">99071</pubid>
                  <pubid idtype="pmpid" link="fulltext">11752314</pubid>
                  <pubid idtype="doi">10.1093/nar/30.1.276</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>T-Coffee: A novel method for fast and accurate multiple sequence alignment</p>
            </title>
            <aug>
               <au>
                  <snm>Notredame</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Heringa</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2000</pubdate>
            <volume>302</volume>
            <fpage>205</fpage>
            <lpage>217</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2000.4042</pubid>
                  <pubid idtype="pmpid" link="fulltext">10964570</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>MEGA2: molecular evolutionary genetics analysis software</p>
            </title>
            <aug>
               <au>
                  <snm>Kumar</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Tamura</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Jakobsen</snm>
                  <fnm>IB</fnm>
               </au>
               <au>
                  <snm>Nei</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>1244</fpage>
            <lpage>1245</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/17.12.1244</pubid>
                  <pubid idtype="pmpid" link="fulltext">11751241</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>A simple method for estimating the parameter of substitution rate variation among sites</p>
            </title>
            <aug>
               <au>
                  <snm>Gu</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1997</pubdate>
            <volume>14</volume>
            <fpage>1106</fpage>
            <lpage>1113</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9364768</pubid>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
