<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2008-9-7-r106</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Taxonomic distribution of large DNA viruses in the sea</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Monier</snm>
               <fnm>Adam</fnm>
               <insr iid="I1"/>
               <email>Adam.Monier@igs.cnrs-mrs.fr</email>
            </au>
            <au id="A2">
               <snm>Claverie</snm>
               <fnm>Jean-Michel</fnm>
               <insr iid="I1"/>
               <email>Jean-Michel.Claverie@igs.cnrs-mrs.fr</email>
            </au>
            <au id="A3" ca="yes">
               <snm>Ogata</snm>
               <fnm>Hiroyuki</fnm>
               <insr iid="I1"/>
               <email>Hiroyuki.Ogata@igs.cnrs-mrs.fr</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Structural and Genomic Information Laboratory, CNRS-UPR 2589, IFR-88, Universit&#233; de la M&#233;diterran&#233;e Parc Scientifique de Luminy, avenue de Luminy, FR-13288 Marseille, France</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2008</pubdate>
         <volume>9</volume>
         <issue>7</issue>
         <fpage>R106</fpage>
         <url>http://genomebiology.com/2008/9/7/R106</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18598358</pubid>
               <pubid idtype="doi">10.1186/gb-2008-9-7-r106</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>15</day>
               <month>2</month>
               <year>2008</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>20</day>
               <month>5</month>
               <year>2008</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>3</day>
               <month>7</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>03</day>
               <month>07</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Monier et al.; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p>Marine DNA viruses</p>
      </shorttitle>
      <shortabs>
         <p>Phylogenetic mapping of metagenomics data reveals the taxonomic distribution of large DNA viruses in the sea, including giant viruses of the Mimiviridae family.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Viruses are ubiquitous and the most abundant biological entities in marine environments. Metagenomics studies are increasingly revealing the huge genetic diversity of marine viruses. In this study, we used a new approach - 'phylogenetic mapping' - to obtain a comprehensive picture of the taxonomic distribution of large DNA viruses represented in the Sorcerer II Global Ocean Sampling Expedition metagenomic data set.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Using DNA polymerase genes as a taxonomic marker, we identified 811 homologous sequences of likely viral origin. As expected, most of these sequences corresponded to phages. Interestingly, the second largest viral group corresponded to that containing mimivirus and three related algal viruses. We also identified several DNA polymerase homologs closely related to Asfarviridae, a viral family poorly represented among isolated viruses and, until now, limited to terrestrial animal hosts. Finally, our approach allowed the identification of a new combination of genes in 'viral-like' sequences.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Albeit only recently discovered, giant viruses of the Mimiviridae family appear to constitute a diverse, quantitatively important and ubiquitous component of the population of large eukaryotic DNA viruses in the sea.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010007">Ecology</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010020">Virology</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Viruses are ubiquitous and the most numerous microbes in marine environments. Previous analyses using electron microscopy, epifluorescence microscopy and flow cytometry revealed the existence of 10<sup>6 </sup>to 10<sup>9 </sup>virus-like particles per milliliter of sea water <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>. Infecting marine organisms from oxygen-producing phytoplankton to whales, viruses regulate the population of many sea organisms and are important effectors of global biogeochemical fluxes <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>. It is also becoming clear that viruses hold a great genetic diversity; comparative genomics <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp> and virus-targeted metagenomics studies <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp> revealed a large amount of viral sequences having no detectable homologs in the databases. As a reservoir of 'new' genes as well as vectors of 'old' genes, viruses may significantly contribute to the evolution of microorganisms in marine ecosystems.</p>
         <p>Despite this progress in characterizing the environmental significance of viruses, a quantitative description of the marine virosphere remains to be done. This includes the determination of the relative abundance of virus families and the assessment of the level of their genetic diversity. In this context, large viruses, whose particle sizes can exceed those of small bacteria <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, are of particular concern. Most of them, such as <it>Acanthamoeba polyphaga </it><abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, may be retained on the 0.16-0.2 &#956;pore filters specifically used in virus-targeted metagenomic studies and may not be gathered in the fraction traditionally associated with viral sequences <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. A recently released marine microbial metagenomic sequence data set, produced by the first phase of the Sorcerer II Global Ocean Sampling (GOS) Expedition <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, provides an opportunity to quantitatively investigate viral diversity in marine environments. The GOS data comprise a large environmental shotgun sequence collection, with 7.7 million sequencing reads assembled into 4.9 billion bp contigs. In the GOS expedition, microbial samples were collected mainly from surface sea waters, and some others were collected from non-marine aquatic environments. Most DNA samples were extracted from the 0.1-0.8 &#956;sized fraction, which is dominated by bacteria. Williamson <it>et al</it>. <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> recently reported that at least 3% of the predicted proteins contained within the GOS data are of viral origin. Notably, a number of sequences most similar to the genome of the giant mimivirus have been found in the Sargasso Sea metagenomic data set <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, produced by a pilot study of the GOS expedition <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, as well as in the new GOS metagenomic data set <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>.</p>
         <p>Determining taxonomic distribution, referred to as 'binning', is the first step to analyze microbial populations in metagenomic sequences <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. One simple binning approach uses database search programs such as BLAST to find the best scoring sequence of known species. A majority rule can be used to assign a taxonomic group to a metagenomic sequence <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B19">19</abbr></abbrgrp>. Similar to the best hit criterion used to define orthologous genes in complete genomes <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>, two-way BLAST searches were used to detect 'mimivirus-like' sequences in metagenomic data <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B17">17</abbr></abbrgrp>. Such a post-processing of homology search results can improve the accuracy of taxonomic assignment. However, the use of homology search programs has serious drawbacks <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. For instance, BLAST scores are highly sensitive to alignment sizes and to insertions/deletions. Further, it is difficult to infer evolutionary distances among high scoring hits only from the BLAST scores.</p>
         <p>Phylogenetic analysis remains the most powerful way to determine taxonomic distribution of metagenomic sequences. Short and Suttle <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> used phylogenetic methods to classify PCR-amplified gene sequences and suggested the existence of previously unknown algal viruses in coastal waters. Similar phylogenetic studies were performed to assess the diversity of T4-type phages <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> or RNA viruses <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr></abbrgrp> in marine environments. In these studies, different markers, such as the major capsid genes or RNA-dependent RNA polymerase gene sequences, were amplified by PCR or RT-PCR and analyzed by phylogenetic methods. To examine taxonomic distribution of large DNA viruses in a metagenomic sequence collection, B-family DNA polymerase (PolB) is a useful marker <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>. PolB sequences are conserved in all known members of nucleocytoplasmic large DNA viruses (NCLDVs) <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>, which include 'Mimiviridae' <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>, Phycodnaviridae, Iridoviridae, Asfarviridae, and Poxviridae. PolB genes are also found in other eukaryotic viruses, such as herpesviruses, baculoviruses, ascoviruses and nimaviruses, in some bacteriophages (for example, T4-phage, cyanophage P-SSM2), and in some archaeal viruses (for example, Halovirus HF1). Eukaryotes have four PolB paralogs (catalytic subunits of &#945;, &#948;, &#949; and &#950; DNA polymerases). PolB genes are found in all of the main archaeal lineages (Nanoarchaeota, Crenarchaeota and Euryarchaeota). The presence of PolB homologs in bacteria (the prototype being <it>Escherichia coli </it>DNA polymerase II) is limited; PolBs are found in Proteobacteria, Acidobacteria, Firmicutes, Chlorobi and Bacteroidetes. PolB genes are suitable for the classification of large DNA viruses <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp> thanks to their strong sequence conservation and an apparently low frequency of recent horizontal transfer <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B33">33</abbr></abbrgrp>.</p>
         <p>When applying phylogenetic methods to environmental shotgun sequences, the treatment of short sequences requires special attention. These sequences show large variation in size and possibly correspond to different parts of a selected marker gene. Piling up multiple short sequences on representative markers from known organisms does not provide an appropriate alignment (whatever software is used) with enough signals for the subsequent phylogenetic analysis. In this study we developed a new phylogeny-based method. The method called 'phylogenetic mapping' analyzes individual metagenomic sequences one by one and determines their phylogenetic positions using a reference multiple sequence alignment (MSA) and a reference tree. As an attempt to investigate the presence, the taxonomic richness and the relative abundance of different large DNA viruses in marine environments, we analyzed the GOS data set using PolB sequences as our reference. Our study does not address the abundances of small DNA viruses or RNA viruses <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B34">34</abbr></abbrgrp>.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Phylogenetic mapping</p>
            </st>
            <p>We searched the GOS data set for PolB-like sequences using the Pfam hidden Markov profile (PF00136). This resulted in a set of 1,947 sequences (from 23-562 amino acid residues). These sequences are referred to as 'PolB fragments' in this study. We next built a reference MSA of PolB homologs from known organisms (Additional data file 1). The reference MSA (Additional data file 2) corresponds to the polymerase domains of PolB homologs and contains 101 sequences, which were selected to achieve the widest possible taxonomic/paralog coverage (but with a non-exhaustive sampling for closely related species) for the analysis of the GOS metagenomic data. The reference MSA was used to generate a maximum likelihood tree (that is, the reference tree; Figure <figr fid="F1">1</figr>). Although the phylogenetic reconstruction did not provide statistical support for most of the basal branches, many peripheral groupings (supported by bootstrap values &#8805; 70%) were coherent with the current taxonomy of viruses and cellular organisms. In this tree, we identified eight viral groups: poxviruses; chloroviruses; phaeoviruses; mimivirus and related algal viruses (<it>Pyramimonas orientalis </it>virus PoV01, <it>Chrysochromulina ericina </it>virus CeV01 and <it>Phaeocystis pouchetii</it> virus PpV01); iridoviruses grouped with ascoviruses; herpesviruses; baculoviruses; and one phage group. The PolB homologs from African swine fever virus (ASFV, Asfarviridae), <it>Emiliania huxleyi </it>virus 86 (EhV-86, Phycodnaviridae), <it>Heterosigma akashiwo</it> virus 1 (HaV, Phycodnaviridae) and the phage RM378 did not show well supported clustering with other PolB sequences. We also identified eleven groups in the reference tree for cellular PolB homologs: seven archaeal groups, one bacterial group and three eukaryotic groups (&#945;, &#948; and &#950; subtypes). Each of the GOS PolB fragments was then examined for its phylogenetic position using the reference MSA and the reference tree. To reduce the computation time and to streamline tprocess of summarizing results, we reduced the size of the reference MSA. Specifically, we selected 51 representatives from the 101 reference sequences and removed the remaining sequences. The reference tree was also reduced so that the resulting tree contains only the selected 51 representatives, while we conserved the original topology of the full reference tree shown in Figure <figr fid="F1">1</figr>. The reduced reference tree has 99 branches (including internal branches). A constraint on this topology defines 99 possible branching positions for each of the GOS PolB fragments. We aligned, one by one, each of the PolB fragments on the reduced reference MSA using the T-Coffee profile method. Based on the resulting profile MSA containing 52 sequences, the likelihoods for all 99 possible branching positions (thus 99 different topologies) were computed by ProtML <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. A statistical significance for the best tree among the 99 topologies was assessed by the RELL (resampling of estimated log likelihoods) bootstrap method <abbrgrp><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr></abbrgrp>. We considered the branching position of a PolB fragment to be supported when the RELL bootstrap value for the best topology was &#8805; 75%.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Maximum likelihood tree of 101 PolB sequences in the complete reference set</p>
               </caption>
               <text>
                  <p>Maximum likelihood tree of 101 PolB sequences in the complete reference set. The phylogenetic tree was built using PhyML <abbrgrp><abbr bid="B73">73</abbr></abbrgrp> (Jones-Taylor-Thornton substitution model <abbrgrp><abbr bid="B76">76</abbr></abbrgrp>, 100 bootstrap replicates) based on a multiple sequence alignment generated using M-Coffee <abbrgrp><abbr bid="B72">72</abbr></abbrgrp>. This tree is unrooted <it>per se</it>. The phage group was arbitrarily chosen as an outgroup for presentation purposes. The lengths of branches do not represent sequence divergence. Bootstrap values lower than 70% are not shown. The selected 51 representatives for the phylogenetic mapping and the associated branches are highlighted in bold face and black lines, respectively. Different colors correspond to different taxa: viruses (blue), eukaryotes (orange), bacteria (green) and archaea (pink).</p>
               </text>
               <graphic file="gb-2008-9-7-r106-1"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Diversity of large DNA viruses in the GOS data set</p>
            </st>
            <p>Our phylogenetic mapping method could assign the best branching position for 1,423 PolB fragments, of which 1,224 (86%) were mapped on viral branches. The best branching position was statistically supported by the RELL method for 869 PolB fragments, of which 811 (93%) were mapped on viral branches. Figure <figr fid="F2">2</figr> and Additional data file 3 show the taxonomic distribution of the GOS PolB fragments. The largest fraction of the PolB fragments was mapped on the phage group. Of 866 cases of mapping within the phage group, 633 were supported. This appears consistent with the current estimate of the large number of phage-like particles and their genetic richness in marine environments <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. The second largest number of supported mappings was found to fall into large eukaryotic viruses commonly found in aquatic environments. Among them, the 'Mimiviridae group' (mimivirus, PoV01 and CeV01 <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>) represented the largest fraction, with 115 supported cases. The chlorovirus group gathered 51 supported cases of mapping. The iridovirus/ascovirus group and the branch leading to HaV showed five supported mappings each. In contrast, no PolB fragment was mapped for the groups for baculoviruses or herpesviruses commonly found in terrestrial animals. Interestingly, we found two PolB fragments mapped with good support on the ASFV branch (JCVI SCAF 1101668126451, JCVI SCAF 1101668152950). When these two PolB fragments were compared to the NCBI non-redundant amino acid sequence database (NRDB) using BLASTP, they were most similar to the ASFV PolB sequence. ASFV is pathogenic to domestic pigs and is currently the sole representative of the Asfarviridae family <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. Concerning cellular organisms, eukaryotic homologs gathered few mappings, as expected from the sample filtration threshold used in the GOS metagenomic study. Two archaeal groups - the group III containing crenarchaeotes (for example, <it>Pyrobaculum aerophilum, Cenarchaeum symbiosum</it>) and the group IV containing euryarchaeotes (for example, <it>Thermoplasma acidophilum</it>, an uncultured euryarchaeote Alv-FOS1) - had 23 and 17 supported cases of mapping, respectively. The bacterial group presented ten supported mappings.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Phylogenetic mapping results of the GOS PolB fragments</p>
               </caption>
               <text>
                  <p>Phylogenetic mapping results of the GOS PolB fragments. Results of the phylogenetic mapping are summarized and displayed for each group in the reference tree. Numbers in parentheses (<it>X</it>/<it>Y</it>) are the total number of mapped PolB fragments (<it>Y</it>) and the number of supported cases (<it>X</it>). The tree topology is the same as the one shown in Figure 1. Branches with bootstrap values &#8805; 70% are marked with filled circles. The 99 branches examined by our phylogenetic mapping are shown with black lines; other peripheral branches are shown with gray lines. The length of the scale bar corresponds to 0.5 substitutions per site. colors correspond to different taxa: viruses (blue), eukaryotes (orange), bacteria (green) and archaea (pink).</p>
               </text>
               <graphic file="gb-2008-9-7-r106-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Validation of the mapping results using long PolB fragments</p>
            </st>
            <p>We examined the phylogenetic mapping result and the sequence diversity of the PolB fragments classified in large eukaryotic virus groups (that is, NCLDVs). From those mapped on NCLDV branches, we selected long PolB fragments that generated a profile MSA showing at least 150 non-gapped sites. We computed a single alignment of these long PolB fragments together with the reference PolB sequences from large eukaryotic virus groups. A maximum likelihood tree (Figure <figr fid="F3">3</figr>) based on the alignment was perfectly consistent with our one-by-one mapping result (Figure <figr fid="F2">2</figr>) in terms of taxonomic assignment. The Mimiviridae group contained 16 PolB fragments showing substantial sequence variations. Twelve of them were significantly closer (bootstrap 100%) to CeV01 or PpV01 (both viruses of haptophytes) than to mimivirus or PoV01 (a green algal virus). Three of the rest were grouped with either mimivirus (bootstrap 89%) or PoV01 (bootstrap 96%). The last one (JCVI SCAF 1096627348452) was placed at the basal position of the Mimiviridae group. Although this basal positioning was not statistically supported, it was consistent with our one-by-one phylogenetic mapping result. The mimivirus PolB shared 47% identical amino acid residues with its closest homolog (JCVI SCAF 1101668170038). A large and diverse group containing 27 PolB fragments (bootstrap 92%) was also found beside the chlorella virus group (<it>Paramecium bursaria </it>chlorella viruses 1, K2 and NY2A). The DNA polymerase gene from the recently released <it>Ostreococcus </it>virus OtV5 genome (GenBank: <ext-link ext-link-type="gen" ext-link-id="EU304328">EU304328</ext-link>) <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> was found grouped together with these PolB fragments. The grouping of a PolB fragment with ASFV PolB was also confirmed (bootstrap 100%).</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Maximum likelihood tree of PolB sequences belonging to NCLDVs</p>
               </caption>
               <text>
                  <p>Maximum likelihood tree of PolB sequences belonging to NCLDVs. The phylogenetic tree was built using PhyML <abbrgrp><abbr bid="B73">73</abbr></abbrgrp> (Jones-Taylor-Thornton substitution model <abbrgrp><abbr bid="B76">76</abbr></abbrgrp>, 100 bootstrap replicates) based on a multiple sequence alignment generated using MUSCLE <abbrgrp><abbr bid="B77">77</abbr></abbrgrp>. Bootstrap values lower than 50% are not shown. GOS sequences are marked with filled circles and displayed in purple. The tree was mid-point rooted. The DNA polymerase gene from the recently released <it>Ostreococcus </it>virus OtV5 (GenBank: <ext-link ext-link-type="gen" ext-link-id="EU304328">EU304328</ext-link>) was included in this tree. The OtV5 PolB was not included in our reference set as it was not available at the time of our phylogenetic mapping study. The length of the scale bar corresponds to 0.5 substitutions per site.</p>
               </text>
               <graphic file="gb-2008-9-7-r106-3"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Viral PolBs are more diverse than bacterial PolBs</p>
            </st>
            <p>We investigated the abundance of viral PolB genes relative to bacterial PolB genes in the GOS data set. Here, we used read coverage as a proxy to measure the abundance of the cognate DNA molecules in the samples. We computed the read coverage of each contig harboring a PolB fragment mapped on the reference tree with significant support, and then obtained the median of the read coverage values for each set of contigs mapped on the same branch (Additional data file 3). PolB sequences mapped on viral branches exhibited low median coverage values ranging from 1.31 for the ASFV branch to 2.00 for a phage branch. The median coverage value for the contigs mapped on the mimivirus branch (12 contigs) was 1.32. The viral contig with the largest read coverage (6.68) was the one mapped on the cyanophage P-SSM4 branch. In contrast, a higher median coverage value (8.40) was found for bacterial contigs mapped on the branch leading to <it>Shewanella frigidimarina</it>. One of the bacterial contigs exhibited a read coverage of 29.17. Viral branches were thus characterized by a large number of mapped contigs exhibiting a low coverage. This is consistent with numerous and very diverse viral populations <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>. On the other hand, the bacterial branches exhibited a lower number of mapped contigs with a larger read coverage. This is consistent with numerous but less diverse populations of bacterial species, although our results concern only bacteria having PolB homologs.</p>
         </sec>
         <sec>
            <st>
               <p>Geographic distributions of viral PolBs</p>
            </st>
            <p>GOS metadata provide physicochemical and biological parameters associated with each sampling site, such as water temperature, salinity, chlorophyll <it>a </it>concentration, and sample's water depth. These data offer additional dimensions to analyze the viral PolB fragments identified by our phylogenetic mapping. Here we compared the relative abundance of the predicted viral PolB fragments and the associated metadata across different GOS sampling sites (Figure <figr fid="F4">4a</figr>).</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Geographic localization</p>
               </caption>
               <text>
                  <p>Geographic localization. <b>(a) </b>The different sampling sites of the Sorcerer II Global Sampling expedition. The samples 00 and 01 are part of the Sargasso Sea pilot study <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. The inset shows samples 27 to 36, which were sampled in the Galapagos Islands. The sampling sites displayed in light gray were not analyzed in the GOS original study, nor in this study. This part of Figure 1 was reproduced from <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. <b>(b) </b>Relative abundance of PolB fragments for virus groups across GOS sampling sites. The left-most panel shows the relative abundance of viral PolBs in difierent GOS samples. The mimivirus group clearly appears as the most ubiquitous after phages. Four area plots (second to fifth panels from the left) show water temperature, chlorophyll <it>a </it>concentration (no information was available for GS20, GS30, GS32, GS33, GS47 and GS51 sites), salinity (no information was available for GS06, GS11, GS13, GS14, GS28, GS30, GS31, GS32, GS34 and GS37 sites) and sample depth, respectively. Two far right histograms (sixth and seventh panels) show the proportion and the estimated number of reads associated with the viral PolB fragments among total reads for a given sample.</p>
               </text>
               <graphic file="gb-2008-9-7-r106-4"/>
            </fig>
            <p>Predicted viral PolB fragments were detected in all of 44 GOS sampling sites (Figure <figr fid="F4">4b</figr>). The relative abundance of different virus groups showed substantial variation across these samples. This is consistent with the diverse ecosystems covered by the GOS expedition.</p>
            <p>PolB fragments classified in the phage group were found in 42 (95%) of the 44 sample sites; the two samples without phage PolB fragments were GS08 (Newport Harbor, Richmond, USA) and GS32 (mangrove). In most samples (32 sites), putative phage PolBs exhibited a higher abundance relative to putative eukaryotic viral PolBs. On the other hand, the relative abundance of eukaryotic viral PolBs was higher than that of phage PolBs in 12 sampling sites. We found a significant positive correlation between the relative abundance of phage PolBs and water temperature (<it>p </it>= 0.001; Fischer's exact test with no correction for multiple testing): phage-type PolBs showed a higher relative abundance than eukaryotic viral PolBs in tropical waters (T &#8805; 20&#176;C), while a reversed tendency was observed in temperate water (T &lt; 20&#176;C). Interestingly, among eukaryotic viral PolBs, putative Mimiviridae PolBs showed the most widespread distribution, being detected in 38 (86%) of the total sites. One of these sampling sites (mangrove located on Isabella, Ecuador) exhibits only viral PolBs classified in the Mimiviridae group. This is the sole mangrove site of all the GOS sampling locations. Mimiviridae PolBs were also relatively abundant in two of the three samples from a hydrostation located in the Sargasso Sea. Three samples correspond to different size fractions: 3.0-20.0 &#956;m for GS01a; 0.8-3.0 &#956;m for GS01b; and 0.1-0.8 &#956;m for GS01c. Putative Mimiviridae PolBs were identified in the GS01a and GS01c samples. The GS01a sample, which was targeted to small eukaryotes, might have contained host species infected by putative viruses of the Mimiviridae group. PolB fragments grouped with chloroviruses were also widely distributed. They were detected in 16 (36%) samples. The relative abundance of this putative eukaryotic virus group showed a significant positive correlation with chlorophyll <it>a </it>concentration, a measure of primary productivity in oceanic regions (<it>p </it>= 0.00002; Fisher's exact test with no correction for multiple testing).</p>
            <p>The sample exhibiting the broadest taxonomic richness of viral PolBs was from Chesapeake Bay (GS12, MD, USA), which is an estuary. The GOS metagenomic sequences from this site exhibited PolB fragments classified in phages, chloroviruses, Asfarviridae and Mimiviridae. Notably, this site is a highly eutrophic estuary with an extremely high chlorophyll <it>a </it>concentration. PolBs classified in Asfarviridae were also detected in another estuary site (GS11, Delaware Bay, NY, USA), which is close to Chesapeake Bay.</p>
         </sec>
         <sec>
            <st>
               <p>Prediction of putative 'new' viral genes</p>
            </st>
            <p>Contigs harboring putative viral PolB homologs were relatively small, ranging from 0.4-12.5 kb (average 1,874 bp) for contigs mapped on eukaryotic viral branches and 0.5-8.8 kb (average 1,885 bp) for phages. To examine the presence of additional open reading frames (ORFs) in these contigs, these putative viral contigs were searched against NRDB using BLASTX. We detected several genes or gene fragments that are usually specific to viruses. For example, several contigs (for example, JCVI SCAF 1096626858151, JCVI SCAF 1096626920680) containing PolB fragments assigned to the chlorovirus group also harbor an ORF most similar to the OtV5 putative major capsid gene. Several putative phage-type contigs (for example, JCVI SCAF 1096628232224, JCVI SCAF 1096626847406) mapped on the cyanophage P-SSM4 branch exhibited ORFs similar to <it>regA </it>(translation repressor of early genes) or <it>uvsX </it>(<it>recA</it>-like recombination and DNA repair protein genes). The presence of such 'virus-specific' genes next to the 'virus-like' PolB homologs corroborates the validity of our phylogenetic mapping approach.</p>
            <p>During this search, we found an ORF similar to RimK, a protein involved in post-translational modification of the ribosomal protein S6, in a contig (JCVI SCAF 1096626956347) having a PolB fragment mapped on the cyanophage P-SSM4 branch. In this contig, the <it>rimK </it>homolog was flanked by a phage-specific <it>regA </it>homolog (Figure <figr fid="F5">5</figr>). <it>rimK </it>homologs are found in bacteria, archaea and eukaryotes <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>. To our knowledge, no <it>rimK </it>homolog has been found in a viral genome. Using this putative viral RimK homolog as a query of TBLASTN, we screened the entire GOS data set. We identified more than a hundred contigs harboring RimK homologs with higher similarities (BLAST score from 137 up to 732; E-value &lt; 10<sup>-30</sup>) than those exhibited by cellular homologs (BLAST score &lt; 132; E-value > 10<sup>-29</sup>) in NRDB. The sequences of those putative phage RimK homologs were readily aligned with <it>Escherichia coli </it>RimK along its entire length (not shown), and showed amino acid residues highly conserved in the ATP-graps domain of bacterial RimK <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>. Several GOS RimK sequences showed an additional domain of unknown function (DUF785, PF05618, E-value &lt; 0.001) at the carboxy-terminal side of the ATP-graps domain. A DUF785 domain is present also in RimK of some bacteria (at the amino-terminal side of the ATP-graps domain) such as <it>Synechococcus </it>sp. (Q7U6F4) and euryarchaeotes (at the carboxy-terminal side of the ATP-graps domain) such as <it>Halobacteria </it>(for example, Q5V351). Furthermore, many of the GOS contigs encoding RimK homologs exhibited additional ORFs usually specific to phages such as T4-like clamp loader subunit genes, contractile tail sheath protein genes or T4-like DNA packaging large subunit terminase genes (Figure <figr fid="F5">5</figr>). Our phylogenetic analysis indicates that those RimK homologs are closely related to each other and distantly related to bacterial RimK (Figure <figr fid="F6">6</figr>). These results suggest the existence of phages carrying <it>rimK </it>homologs in marine environments.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Gene organization of GOS contigs with putative phage RimK sequences</p>
               </caption>
               <text>
                  <p>Gene organization of GOS contigs with putative phage RimK sequences. Putative phage <it>rimK </it>genes are shown in red. Other predicted genes are color coded according to their best BLAST hit taxonomies in NRDB as shown in the inset panel. MT-A70 corresponds to the adenine-specific methyltransferase. gp17 is a T4-like DNA packaging large subunit terminase homolog. gp18 is a contractile tail sheath protein homolog. The crystal structure of a GOS homolog for the protein encoded by the hypothetical gene (gray) has been determined and is available in the Protein Data Bank (3BY7).</p>
               </text>
               <graphic file="gb-2008-9-7-r106-5"/>
            </fig>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Maximum likelihood tree of RimK sequences</p>
               </caption>
               <text>
                  <p>Maximum likelihood tree of RimK sequences. RimK sequences were retrieved from UniProt <abbrgrp><abbr bid="B78">78</abbr></abbrgrp> and from the GOS metagenomic data set using BLASTP. The phylogenetic reconstruction was performed using PhyML <abbrgrp><abbr bid="B73">73</abbr></abbrgrp> (Jones-Taylor-Thornton substitution model <abbrgrp><abbr bid="B76">76</abbr></abbrgrp>, 100 bootstrap replicates) based on a multiple sequence alignment generated with MUSCLE <abbrgrp><abbr bid="B77">77</abbr></abbrgrp>. Bootstrap values lower than 50% are not shown. The tree was mid-point rooted. GOS sequences are marked with filled circles and displayed in purple. The length of the scale bar corresponds to 0.4 substitutions per site.</p>
               </text>
               <graphic file="gb-2008-9-7-r106-6"/>
            </fig>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>Until recently, the marine virosphere was <it>terra incognita</it>. The increasing amount of environmental sequence data now provides unprecedented opportunities to explore the viral world. Previous studies characterized the abundance and the genetic richness of marine viruses using environmental sequencing approaches <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B14">14</abbr><abbr bid="B19">19</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>. However, the extent of species diversity within individual viral groups is still unclear. This is especially the case for large DNA viruses. Large DNA viruses were often overlooked or were not the specific focus of marine metagenomic projects. In this study, we used a new phylogenetic mapping approach to identify viral PolB sequences contained in the GOS metagenomic data set and assessed their taxonomic distribution. This study does not concern small viruses, including RNA viruses. Beyond BLAST searches, our phylogenetic mapping approach provided a somewhat unexpected picture of the taxonomic distribution of viral sequences in the metagenomic data.</p>
         <p>In the GOS data we identified 811 PolB-like sequences closely related to known viral PolB sequences. This is consistent with the existence of a wide taxonomic spectrum of PolB-containing DNA viruses in marine environments <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. As previously noted <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, phages are the main contributors to this diversity; our method predicted that 78% (633/811) of the viral PolB fragments were of phage origin. This proportion is likely an underestimate of the actual taxonomic diversity of double-stranded DNA phages in the GOS sampling areas as only a subset of DNA phages carry PolB genes.</p>
         <p>Interestingly, the mimivirus group was the second largest in terms of the number of assigned PolB fragments (that is, 115 cases of mapping). Previous studies revealed the existence of mimivirus-like sequences in the GOS metagenomic data set <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B17">17</abbr></abbrgrp>. Our data now suggest that the species/strain richness contained in the GOS metagenomic samples for this viral group may be comparable to those exhibited by other groups of eukaryotic large DNA viruses, including most of the previously characterized phycodnaviruses. The amoeba infecting mimivirus has the largest known viral genome (1.2 Mb). Its particle size is approximately 0.7 m in diameter including its filamentous layer <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. In addition, the mimivirus group contains two haptophyte viruses (CeV01 (510 kb), and PpV01 (485-kb)) and a virus infecting a green algal species (PoV01 (560 kb)) <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B42">42</abbr></abbrgrp>. Their genomes are also larger than any other eukaryotic viruses sequenced so far <abbrgrp><abbr bid="B43">43</abbr><abbr bid="B44">44</abbr></abbrgrp>. The particle sizes of these three algal viruses are 0.16-0.22 &#956;m, being compatible with the filter sizes used in the GOS sampling. Notably, their particle sizes are comparable to those of classic phycodnaviruses with a mean diameter of 0.16 &#177; 0.06 &#956;m <abbrgrp><abbr bid="B45">45</abbr><abbr bid="B46">46</abbr></abbrgrp>. By counting overlapping PolB fragments mapped on the mimivirus group, we estimated that at least 85 distinct species/strains of Mimiviridae are present in the GOS metagenomic samples. Within the mimivirus group, two haptophyte viruses (PpV1 and CeV01) were clustered together with a high bootstrap value (Figure <figr fid="F3">3</figr>). Most (84%; 97/115) of the Mimiviridae-like PolB fragments were mapped within this subgroup. Haptophyte species may thus be the major hosts of putative viruses corresponding to the PolB subgroup. Overall, these data suggest that large DNA viruses composing the Mimiviridae group represent one of the main components of marine eukaryotic large DNA viruses.</p>
         <p>The branch leading to the chloroviruses presented 51 cases of GOS PolB fragment mapping. These GOS sequences were closely related to the recently determined PolB sequence from OtV5. OtV5 infects <it>Ostreococcus tauri</it>, a small green algal species of prasinophyte (approximately 1 &#956;m in diameter) found in diverse geographic locations <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>. Short and Suttle identified a group of viral sequences closely related to prasinoviruses (<it>Micromonas pusilla </it>viruses) through sequencing PCR products targeted to algal virus PolBs <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. We found that some of the sequences studied in their work were also highly similar to the OtV5 PolB sequence. For instance, the sequence named BSA99-5 (GenBank: <ext-link ext-link-type="gen" ext-link-id="AF405581">AF405581</ext-link>) in their study exhibited 93% amino acid sequence identity to the OtV5 PolB sequence. This suggests that the major hosts for this putative viral group may be prasinophytes.</p>
         <p>Surprisingly, we identified two PolB fragments most closely related to the ASFV PolB. ASFV is currently the sole isolated member of the Asfarviridae family. The known natural hosts of ASFV are terrestrial animals, including warthogs, bush pigs and soft ticks <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. ASFV causes a persistent but asymptomatic infection in these hosts. In domestic pigs, ASFV causes an acute hemorrhagic infection with mortality rates up to 100% depending on different viral isolates. We now predict the existence of additional Asfarviridae in marine environments, although the contamination from terrestrial origin cannot be excluded. In a recent metagenomic study, Marhaver <it>et al</it>. <abbrgrp><abbr bid="B48">48</abbr></abbrgrp> analyzed the viral communities associated with healthy and bleaching corals. They showed that alphaherpesvirus-like and gammaherpesvirus-like sequences accounted for 4-8% of the analyzed environmental sequences. GOS sampling sites include a coral reef atoll site (GS51). No herpesvirus-type PolB fragment was detected in our study.</p>
         <p>Through the analysis of geographic distribution, we found that putative viral PolB fragments were identified in all of the 44 GOS samples. This suggests a wide presence of PolB-encoding viruses in diverse marine environments. Interestingly, phage PolB sequences were more abundant than eukaryotic viral PolB sequences in samples from tropical areas; conversely, many samples from temperate areas were enriched in eukaryotic viral PolBs. Further, most of the samples showing a great taxonomic richness of viral PolB sequences corresponded to those from temperate areas. This observation is consistent with the current understanding of the distribution of eukaryotic and bacterial phytoplankton in oceans. Gibb <it>et al</it>. <abbrgrp><abbr bid="B49">49</abbr></abbrgrp> surveyed the spatial distributions of phytoplankton pigments across the Atlantic Ocean over 100&#176; of latitude (from 50&#176;N to 50&#176;S). They showed a major transition in pigment characteristics from temperate to tropical/sub-tropical waters; temperate waters were characterized by larger phyto-biomass enriched in eukaryotic phytoplankton, while tropical/sub-tropical waters exhibited smaller phyto-biomass enriched in prokaryotic phytoplankton such as prochlorophytes <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>.</p>
         <p>The relatively high abundance of eukaryotic viral PolBs in samples from temperate areas (showing high chlorophyll <it>a </it>concentrations) was mainly due to the abundance of the GOS PolB sequences grouped with chlorovirus PolBs. This again suggests that the hosts of these putative viruses are green algae (such as prasinophytes). In contrast, Mimiviridae-like PolB fragments showed a wider geographical distribution. They were identified in sequences from most of the GOS sampling sites, from northeast Atlantic Ocean to southwest Pacific Ocean. These sites correspond to a variety of habitat types, such as coastal seas, open oceans, fresh water sites (GS20, Lake Gatun, Panama; GS32, mangrove, Isabella, Ecuador) and even hypersaline waters (GS33, Punta Cormorant Lagoon, Floreana, Ecuador). The detection of Mimiviridae-like PolB fragments was not correlated with chlorophyll <it>a </it>concentration. Hence, the hosts of these putative Mimiviridae viruses are not limited in temperate/eutrophic waters. In fact, species of haptophyte have been found and known to occasionally form blooms in waters from sub-polar to (sub-)tropical latitudes, including oligotrophic areas <abbrgrp><abbr bid="B50">50</abbr><abbr bid="B51">51</abbr><abbr bid="B52">52</abbr></abbrgrp>. <it>Acanthamoeba</it>, the host of mimivirus, also have the ability to survive in diverse environments <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>.</p>
         <p>Finally, our study allowed the identification of putative phage <it>rimK</it>. In <it>E. coli</it>, RimK catalyzes the post-translational addition of glutamic acid residues to the amino terminus of ribosomal protein S6 <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>. A resistance to antibiotics was suggested for the <it>E. coli </it>mutant lacking the activity of the S6-modification <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>. Reeh and Pedersen <abbrgrp><abbr bid="B56">56</abbr></abbrgrp> showed that the relative level of the S6-modification was not affected by the growth rate in culture. Besides these observations, however, much is unknown for the functional consequence of the S6 modification in <it>E. coli</it>. Bacteriophage T7 modifies ribosomal protein S6, S1 and translational initiation factors by phosphorylation upon infection of <it>E. coli </it><abbrgrp><abbr bid="B57">57</abbr></abbrgrp>. The modifications of host translational proteins are performed by a T7-encoded kinase, and enhance phage reproduction under sub-optimal growth conditions. It was suggested that the phosphorylation of these proteins serves to stimulate translation of the phage late mRNAs. The RimK homologs found in phage-like contigs may be involved in a similar process. Unexpected homologs of cellular genes are continuously identified in viral genome sequences <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B58">58</abbr><abbr bid="B59">59</abbr></abbrgrp>. We believe that our phylogenetic mapping approach will be useful to identify further occurrences of unexpected viral genes in environmental sequences.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>The use of a phylogenetic approach provided a comprehensive picture of the taxonomic distribution of large viruses enclosed in the GOS metagenomic data. As expected, the highest genetic richness corresponded to phages. Interestingly, our data suggest that Mimiviridae represent a major and ubiquitous component of large eukaryotic DNA viruses in diverse marine environments.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Extraction of PolB fragments from the GOS metagenomic data set</p>
            </st>
            <p>We retrieved the combined assemblies of the GOS metagenomic data through the CAMERA website <abbrgrp><abbr bid="B60">60</abbr></abbrgrp>. The data set was composed of 3,081,849 scaffolds. We extracted all the stop-to-stop ORFs (&#8805; 60 amino acid residues) from the assembled sequences using EMBOSS/GETORF <abbrgrp><abbr bid="B61">61</abbr></abbrgrp>. We obtained a set of 21,406,171 ORFs. Those ORFs were translated into corresponding amino acid sequences. To identify PolB-like fragments in this set, we used the Pfam profile (PF00136, both long and fragment search versions: 'ls' and 'fs') <abbrgrp><abbr bid="B62">62</abbr></abbrgrp> and the HMMER software as a search engine <abbrgrp><abbr bid="B63">63</abbr></abbrgrp> using an E-value threshold of 0.001. We then removed redundancy (due to the double use of 'ls' and 'fs' versions of the Pfam profile) and false positive detections (having the best hit against non-PolB sequences in the NRDB) by BLASTP <abbrgrp><abbr bid="B64">64</abbr></abbrgrp> using an E-value threshold of 10<sup>-5</sup>). We extracted only the parts of metagenomic amino acid sequences that were aligned on the Pfam profile representing the polymerase domains of PolB. Thus, additional domains (such as endonuclease domains) were not included in our PolB sequence set. No contig was found to contain more than one PolB homolog. As a result of these processes, we obtained 1,947 distinct PolB-like sequences (from 23-562 amino acid residues); these sequences are referred to as PolB fragments in this study. We parsed the GOS PolB fragments to find intein insertions by the TIGRFAM profiles TIGR01445 (intein amino terminus) and TIGR01443 (intein carboxyl terminus) <abbrgrp><abbr bid="B65">65</abbr></abbrgrp>, but none of these fragments had a detectable intein domain. In this study, we did not include the protein priming subfamily of the B family DNA polymerase <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>, which is represented by the Pfam profile PF03175. The members of this subfamily are found in eukaryotic linear plasmids of mitochondrion, phages and adenoviruses.</p>
         </sec>
         <sec>
            <st>
               <p>PolB homologs from the NRDB</p>
            </st>
            <p>We retrieved PolB homologs from the NRDB, RefSeq <abbrgrp><abbr bid="B66">66</abbr></abbrgrp> and KEGG <abbrgrp><abbr bid="B67">67</abbr></abbrgrp> databases using BLAST using multiple query sequences (E-value &lt; 10<sup>-5</sup>) and the PolB Pfam profile (E-value &lt; 0.001). We removed species redundancy using BLASTCLUST <abbrgrp><abbr bid="B64">64</abbr></abbrgrp> while keeping the widest possible taxonomic/paralog coverage (but with a non-exhaustive sampling for closely related species). This resulted in a set of 120 PolB homologs (Additional data file 1). We removed intein sequences in the PolBs of mimivirus <abbrgrp><abbr bid="B68">68</abbr></abbrgrp>, HaV <abbrgrp><abbr bid="B69">69</abbr></abbrgrp> and CeV01 (GenBank: <ext-link ext-link-type="gen" ext-link-id="ABU23716">ABU23716</ext-link>).</p>
         </sec>
         <sec>
            <st>
               <p>Construction of the reference alignment and the reference tree</p>
            </st>
            <p>We next constructed an alignment of PolB homologs from known organisms (that is, the reference MSA) and generated a phylogenetic tree of PolB homologs (that is, the reference tree). There is a tradeoff between the number of distant homologs included in the reference MSA (contributing to a wider taxonomic/paralog coverage) and the quality of the resulting MSA and tree (contributing to a reliable classification of metagenomic sequences). Among the 120 PolB homologs, we identified 19 highly divergent sequences that decrease the quality of the resulting PolB alignment and tree but that show no close homologs in the GOS PolB fragments. This process was performed through multiple trials of building alignments by T-Coffee <abbrgrp><abbr bid="B70">70</abbr></abbrgrp> and phylogenetic trees by PhyML for the PolB homologs. These 19 sequences correspond to six groups of PolB homologs: eukaryotic DNA polymerase &#949;, a <it>Trichomonas vaginalis </it>DNA polymerase &#945;-like paralog, PolBs of unclassified herpesviruses (Ostreid, Ictalurid and Ranid herpesviruses), <it>Heliothis zea </it>virus, a nimavirus (shrimp white spot syndrome virus), and PolBs of a group of bacteria related to <it>Prosthecochloris vibrioformis </it>and <it>Chlorobium tepidum</it>. There was no PolB-like fragment in the GOS data exhibiting a best BLAST hit against these groups of PolB homologs. Therefore, the removal of the six groups of PolB homologs from our reference data set does not affect the interpretation of the results described in this manuscript. After discarding these 19 sequences, the final PolB set was composed of 101 sequences. We aligned the 101 PolB sequences using M-Coffee accessible from a public server <abbrgrp><abbr bid="B71">71</abbr></abbrgrp> with the use of default options. M-Coffee is a meta-method for assembling multiple sequence alignments <abbrgrp><abbr bid="B72">72</abbr></abbrgrp>. We extracted only the polymerase domain sequences from the alignment (that is, the reference MSA; Additional data file 2). The reference alignment showed four conserved regions (numbered from I to IV) previously described as the signatures of the PolB polymerase domains <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. We next built a maximum likelihood tree based on the reference MSA (that is, the reference tree) using PhyML after removing gap-containing sites <abbrgrp><abbr bid="B73">73</abbr></abbrgrp> with JTT substitution model and a gamma low (four rate categories). Bootstrap values were obtained after 100 bootstrap replicates. We used the phylogeny.fr platform <abbrgrp><abbr bid="B74">74</abbr></abbrgrp> to generated scalable vector graphics from newick formatted trees.</p>
         </sec>
         <sec>
            <st>
               <p>Phylogenetic mapping</p>
            </st>
            <p>Each of the metagenomic PolB fragments was taxonomically assigned by aligning it against the reference MSA and by examining its phylogenetic position in the reference tree. In order to reduce the computation time and to avoid unnecessary complications in summarizing results within too dense a tree, we reduced the size of the reference MSA and the reference tree. Specifically, we selected 51 PolBs from the 101 PolBs contained in the initial set. We kept the selected 51 PolBs in the reduced set, and deleted the remaining PolBs. The selection of the 51 representatives was carried out in the following way. First, we selected all the PolBs (that is, ASFV, EhV86, HaV, Phage RM378) that were not grouped with other PolBs with a statistical support (&#8805; 70% bootstrap value) in the initial reference tree (Figure <figr fid="F1">1</figr>). Second, we selected two or three representatives from each of the statistically supported monophyletic groups (&#8805; 70% bootstrap value). The choice of representatives from a monophyletic group was arbitrary. We simply selected two or three relatively distant sequences from the members of the monophyletic group. To obtain a reduced reference MSA composed of the selected 51 sequences, we extracted a part (that is, lines) of the initial reference MSA (containing gaps). The initial reference tree (composed of 199 branches including internal ones) was also reduced by pruning branches leading to the non-selected leaves using BAOBAB <abbrgrp><abbr bid="B75">75</abbr></abbrgrp>.</p>
            <p>The reduced reference tree has 99 branches (including internal branches); the constraint on the topology of the reduced reference tree thus defined 99 possible branching positions for each PolB-like fragment extracted from the metagenomic data set. The reduced reference MSA and the reduced reference tree are the basis for our phylogenetic mapping in this study. Each of the PolB fragments from the GOS data set was aligned on the reduced reference MSA (containing gaps) using T-Coffee <abbrgrp><abbr bid="B70">70</abbr></abbrgrp> with a profile alignment option. For the T-Coffee profile alignment, we used the option '-profile comparison = full10'. If a GOS PolB fragment generates an alignment with less than 50 sites after removing gap-containing sites, we discarded the GOS PolB fragment from our analysis. Based on the resulting alignment (51 reference sequences and one GOS PolB fragment), the likelihoods of all 99 possible branching positions (thus 99 different topologies) for the PolB fragment were computed by ProtML <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. A statistical significance for the best tree among the 99 topologies was assessed by the RELL method <abbrgrp><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr></abbrgrp>. We considered the branching position of a PolB fragment to be supported when the RELL bootstrap value for the best topology was &#8805; 75%.</p>
         </sec>
         <sec>
            <st>
               <p>Read coverage</p>
            </st>
            <p>Read coverage for a contig was defined by dividing the cumulated size of reads contributing to the contig by the size of the contig.</p>
         </sec>
         <sec>
            <st>
               <p>Relative abundance of PolBs</p>
            </st>
            <p>For the analysis of the relative abundance of PolB sequences, we used the same approach used by Williamson <it>et al</it>. <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. Briefly, we first estimated the average number of reads overlapping with a part of a contig where a PolB domain was encoded, by taking into account the length of the PolB domain (as defined by the Pfam hit) and the length of the contig. The abundance of the PolB-sequences for each viral group for a given sample site was then quantified by the total number of reads associated with the relevant set of PolB-sequences (that is, the sum of the estimated read numbers). For a given site, the viral PolB proportion was computed by dividing the total number of viral PolB reads (for all viral groups) by the total number of reads obtained from the site.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>ASFV, African swine fever virus; CeV, <it>Chrysochromulina ericina </it>virus; EhV86, <it>Emiliania huxleyi </it>virus 86; GOS, Global Ocean Sampling; HaV, <it>Heterosigma akashiwo </it>virus 1; MSA, multiple sequence alignment; NCLDV, nucleocytoplasmic large DNA virus; NRDB, NCBI non-redundant amino-acid sequence database; ORF, open reading frame; PolB, B-family DNA polymerase; PoV, <it>Pyramimonas orientalis </it>virus; PpV, <it>Phaeocystis pouchetii </it>virus; RELL, resampling of estimated log likelihoods.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>AM performed the analyses. HO designed the experiments. All authors analyzed the data and contributed to the writing of the manuscript.</p>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>The following additional data are available with the online version of this paper. Additional data file <supplr sid="S1">1</supplr> is a table listing the PolB sequences used in the study. Additional data file <supplr sid="S2">2</supplr> is a multiple sequence alignment of 101 PolB sequences retrieved from databases. Additional data file <supplr sid="S3">3</supplr> is a figure summarizing the results of the phylogenetic mapping of the GOS PolB fragments, which are displayed for each of the 99 branches tested.</p>
         <suppl id="S1">
            <title>
               <p>Additional data file 1</p>
            </title>
            <caption>
               <p>PolB sequences used in the study</p>
            </caption>
            <text>
               <p>The IDs and species names of the PolB sequences retrieved from databases are given. Sequences used in the reference multiple alignment are in bold.</p>
            </text>
            <file name="gb-2008-9-7-r106-S1.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S2">
            <title>
               <p>Additional data file 2</p>
            </title>
            <caption>
               <p>Reference MSA of PolB sequences</p>
            </caption>
            <text>
               <p>Sequences used in the final reduced reference multiple alignment are displayed with an asterisk.</p>
            </text>
            <file name="gb-2008-9-7-r106-S2.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S3">
            <title>
               <p>Additional data file 3</p>
            </title>
            <caption>
               <p>Phylogenetic mapping results of the GOS PolB fragments</p>
            </caption>
            <text>
               <p>The GOS PolB fragments are displayed for each of the 99 branches tested. Numbers in parentheses (<it>V/W</it>) are the total number of mapped PolB fragments (<it>W</it>) and the number of supported cases (<it>V</it>) (displayed in red). Read coverage values are presented as follows: [<it>X</it>-<it>Y</it>]-(<it>Z</it>) where X and Y are the read coverage value range (minimum/maximum) and <it>Z </it>the read coverage median value.</p>
            </text>
            <file name="gb-2008-9-7-r106-S3.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We are thankful to Colomban de Vargas for fruitful discussions and to anonymous referees for useful suggestions. We are also thankful to Alexis Dereeper for graphic support. AM is partially supported by the EuroPathoGenomics European network of excellence. This work was partially supported by Marseille-Nice Genopole and the French National Network (RNG).</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>High abundance of viruses found in aquatic environments.</p>
            </title>
            <aug>
               <au>
                  <snm>Bergh</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Borsheim</snm>
                  <fnm>KY</fnm>
               </au>
               <au>
                  <snm>Bratbak</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Heldal</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1989</pubdate>
            <volume>340</volume>
            <fpage>467</fpage>
            <lpage>468</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/340467a0</pubid>
                  <pubid idtype="pmpid" link="fulltext">2755508</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Ecology of prokaryotic viruses.</p>
            </title>
            <aug>
               <au>
                  <snm>Weinbauer</snm>
                  <fnm>MG</fnm>
               </au>
            </aug>
            <source>FEMS Microbiol Rev</source>
            <pubdate>2004</pubdate>
            <volume>28</volume>
            <fpage>127</fpage>
            <lpage>181</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.femsre.2003.08.001</pubid>
                  <pubid idtype="pmpid" link="fulltext">15109783</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Viruses in the sea.</p>
            </title>
            <aug>
               <au>
                  <snm>Suttle</snm>
                  <fnm>CA</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2005</pubdate>
            <volume>437</volume>
            <fpage>356</fpage>
            <lpage>361</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature04160</pubid>
                  <pubid idtype="pmpid" link="fulltext">16163346</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Marine viruses and their biogeochemical and ecological effects.</p>
            </title>
            <aug>
               <au>
                  <snm>Fuhrman</snm>
                  <fnm>JA</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1999</pubdate>
            <volume>399</volume>
            <fpage>541</fpage>
            <lpage>548</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/21119</pubid>
                  <pubid idtype="pmpid" link="fulltext">10376593</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Viruses and nutrient cycles in the sea.</p>
            </title>
            <aug>
               <au>
                  <snm>Wilhelm</snm>
                  <fnm>SW</fnm>
               </au>
               <au>
                  <snm>Suttle</snm>
                  <fnm>CA</fnm>
               </au>
            </aug>
            <source>BioScience</source>
            <pubdate>1999</pubdate>
            <volume>49</volume>
            <fpage>781</fpage>
            <lpage>788</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.2307/1313569</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Unique genes in giant viruses: regular substitution pattern and anomalously short size.</p>
            </title>
            <aug>
               <au>
                  <snm>Ogata</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Claverie</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2007</pubdate>
            <volume>17</volume>
            <fpage>1353</fpage>
            <lpage>1361</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1950904</pubid>
                  <pubid idtype="pmpid" link="fulltext">17652424</pubid>
                  <pubid idtype="doi">10.1101/gr.6358607</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Identification and investigation of ORFans in the viral world.</p>
            </title>
            <aug>
               <au>
                  <snm>Yin</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Fischer</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2008</pubdate>
            <volume>9</volume>
            <fpage>24</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2245933</pubid>
                  <pubid idtype="pmpid" link="fulltext">18205946</pubid>
                  <pubid idtype="doi">10.1186/1471-2164-9-24</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Viral metagenomics.</p>
            </title>
            <aug>
               <au>
                  <snm>Edwards</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Rohwer</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Nat Rev Microbiol</source>
            <pubdate>2005</pubdate>
            <volume>3</volume>
            <fpage>504</fpage>
            <lpage>510</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrmicro1163</pubid>
                  <pubid idtype="pmpid" link="fulltext">15886693</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>The marine viromes of four oceanic regions.</p>
            </title>
            <aug>
               <au>
                  <snm>Angly</snm>
                  <fnm>FE</fnm>
               </au>
               <au>
                  <snm>Felts</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Breitbart</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Salamon</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Edwards</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Carlson</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Chan</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Haynes</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kelley</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Mahaffy</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Mueller</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Nulton</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Olson</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Parsons</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Rayhawk</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Suttle</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Rohwer</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>PloS Biol</source>
            <pubdate>2006</pubdate>
            <volume>4</volume>
            <fpage>e368</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1634881</pubid>
                  <pubid idtype="pmpid" link="fulltext">17090214</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0040368</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Genomic analysis of uncultured marine viral communities.</p>
            </title>
            <aug>
               <au>
                  <snm>Breitbart</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Salamon</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Andresen</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Mahaffy</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Segall</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Mead</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Azam</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Rohwer</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2002</pubdate>
            <volume>99</volume>
            <fpage>14250</fpage>
            <lpage>14255</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">137870</pubid>
                  <pubid idtype="pmpid" link="fulltext">12384570</pubid>
                  <pubid idtype="doi">10.1073/pnas.202488399</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Mimivirus and the emerging concept of 'giant' virus.</p>
            </title>
            <aug>
               <au>
                  <snm>Claverie</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Ogata</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Audic</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Abergel</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Suhre</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Fournier</snm>
                  <fnm>PE</fnm>
               </au>
            </aug>
            <source>Virus Res</source>
            <pubdate>2006</pubdate>
            <volume>117</volume>
            <fpage>133</fpage>
            <lpage>144</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.virusres.2006.01.008</pubid>
                  <pubid idtype="pmpid" link="fulltext">16469402</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>The 1.2-megabase genome sequence of Mimivirus.</p>
            </title>
            <aug>
               <au>
                  <snm>Raoult</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Audic</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Robert</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Abergel</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Renesto</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Ogata</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Scola</snm>
                  <fnm>BL</fnm>
               </au>
               <au>
                  <snm>Suzan</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Claverie</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>306</volume>
            <fpage>1344</fpage>
            <lpage>1350</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1101485</pubid>
                  <pubid idtype="pmpid" link="fulltext">15486256</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific.</p>
            </title>
            <aug>
               <au>
                  <snm>Rusch</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Halpern</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Heidelberg</snm>
                  <fnm>KB</fnm>
               </au>
               <au>
                  <snm>Williamson</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Yooseph</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Hoffman</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Remington</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Beeson</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Tran</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Baden-Tillson</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Stewart</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Thorpe</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Freeman</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Andrews-Pfannkoch</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Venter</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Kravitz</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Heidelberg</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Utterback</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Rogers</snm>
                  <fnm>YH</fnm>
               </au>
               <au>
                  <snm>Falcon</snm>
                  <fnm>LI</fnm>
               </au>
               <au>
                  <snm>Souza</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Bonilla-Rosso</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Eguiarte</snm>
                  <fnm>LE</fnm>
               </au>
               <au>
                  <snm>Karl</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Sathyendranath</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Platt</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Bermingham</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Gallardo</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Tamayo-Castillo</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Ferrari</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Strausberg</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Nealson</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Friedman</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Frazier</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Venter</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>PloS Biol</source>
            <pubdate>2007</pubdate>
            <volume>5</volume>
            <fpage>e77</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1821060</pubid>
                  <pubid idtype="pmpid" link="fulltext">17355176</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0050077</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>The SorcererII Global Ocean Sampling Expedition: metagenomic characterization of viruses within aquatic microbial samples.</p>
            </title>
            <aug>
               <au>
                  <snm>Williamson</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Rusch</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Yooseph</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Halpern</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Heidelberg</snm>
                  <fnm>KB</fnm>
               </au>
               <au>
                  <snm>Glass</snm>
                  <fnm>JI</fnm>
               </au>
               <au>
                  <snm>Andrews-Pfannkoch</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Fadrosh</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>CS</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Frazier</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Venter</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>PloS ONE</source>
            <pubdate>2008</pubdate>
            <volume>3</volume>
            <fpage>e1456</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2186209</pubid>
                  <pubid idtype="pmpid" link="fulltext">18213365</pubid>
                  <pubid idtype="doi">10.1371/journal.pone.0001456</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Mimivirus relatives in the Sargasso sea.</p>
            </title>
            <aug>
               <au>
                  <snm>Ghedin</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Claverie</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Virol J</source>
            <pubdate>2005</pubdate>
            <volume>2</volume>
            <fpage>62</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1215527</pubid>
                  <pubid idtype="pmpid" link="fulltext">16105173</pubid>
                  <pubid idtype="doi">10.1186/1743-422X-2-62</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Environmental genome shotgun sequencing of the Sargasso Sea.</p>
            </title>
            <aug>
               <au>
                  <snm>Venter</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Remington</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Heidelberg</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Halpern</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Rusch</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Paulsen</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>KE</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Fouts</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Levy</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Knap</snm>
                  <fnm>AH</fnm>
               </au>
               <au>
                  <snm>Lomas</snm>
                  <fnm>MW</fnm>
               </au>
               <au>
                  <snm>Nealson</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Peterson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hoffman</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Parsons</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Baden-Tillson</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Pfannkoch</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Rogers</snm>
                  <fnm>YH</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>HO</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>304</volume>
            <fpage>66</fpage>
            <lpage>74</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1093857</pubid>
                  <pubid idtype="pmpid" link="fulltext">15001713</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Marine mimivirus relatives are probably large algal viruses.</p>
            </title>
            <aug>
               <au>
                  <snm>Monier</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Larsen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sandaa</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Bratbak</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Claverie</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Ogata</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Virol J</source>
            <pubdate>2008</pubdate>
            <volume>5</volume>
            <fpage>12</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2245910</pubid>
                  <pubid idtype="pmpid" link="fulltext">18215256</pubid>
                  <pubid idtype="doi">10.1186/1743-422X-5-12</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Environmental shotgun sequencing: its potential and challenges for studying the hidden world of microbes.</p>
            </title>
            <aug>
               <au>
                  <snm>Eisen</snm>
                  <fnm>JA</fnm>
               </au>
            </aug>
            <source>PloS Biol</source>
            <pubdate>2007</pubdate>
            <volume>5</volume>
            <fpage>e82</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1821061</pubid>
                  <pubid idtype="pmpid" link="fulltext">17355177</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0050082</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>The Sorcerer II Global Ocean Sampling Expedition: expanding the universe of protein families.</p>
            </title>
            <aug>
               <au>
                  <snm>Yooseph</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Rusch</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Halpern</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Williamson</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Remington</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Heidelberg</snm>
                  <fnm>KB</fnm>
               </au>
               <au>
                  <snm>Manning</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Jaroszewski</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Cieplak</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>CS</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Mashiyama</snm>
                  <fnm>ST</fnm>
               </au>
               <au>
                  <snm>Joachimiak</snm>
                  <fnm>MP</fnm>
               </au>
               <au>
                  <snm>van Belle</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Chandonia</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Soergel</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Zhai</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Natarajan</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Raphael</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Bafna</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Friedman</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Brenner</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Godzik</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Dixon</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Strausberg</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Frazier</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Venter</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>PloS Biol</source>
            <pubdate>2007</pubdate>
            <volume>5</volume>
            <fpage>e16</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1821046</pubid>
                  <pubid idtype="pmpid" link="fulltext">17355171</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0050016</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>A minimal gene set for cellular life derived by comparison of complete bacterial genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Mushegian</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1996</pubdate>
            <volume>93</volume>
            <fpage>10268</fpage>
            <lpage>10273</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">38373</pubid>
                  <pubid idtype="pmpid" link="fulltext">8816789</pubid>
                  <pubid idtype="doi">10.1073/pnas.93.19.10268</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Genome plasticity as a paradigm of eubacteria evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Watanabe</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Mori</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Itoh</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Gojobori</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1997</pubdate>
            <volume>44</volume>
            <issue>Suppl 1</issue>
            <fpage>S57</fpage>
            <lpage>S64</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">9395406</pubid>
                  <pubid idtype="doi">10.1007/PL00000052</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>The closest BLAST hit is often not the nearest neighbor.</p>
            </title>
            <aug>
               <au>
                  <snm>Koski</snm>
                  <fnm>LB</fnm>
               </au>
               <au>
                  <snm>Golding</snm>
                  <fnm>GB</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>2001</pubdate>
            <volume>52</volume>
            <fpage>540</fpage>
            <lpage>542</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11443357</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Sequence analysis of marine virus communities reveals that groups of related algal viruses are widely distributed in nature.</p>
            </title>
            <aug>
               <au>
                  <snm>Short</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Suttle</snm>
                  <fnm>CA</fnm>
               </au>
            </aug>
            <source>Appl Environ Microbiol</source>
            <pubdate>2002</pubdate>
            <volume>68</volume>
            <fpage>1290</fpage>
            <lpage>1296</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">123764</pubid>
                  <pubid idtype="pmpid" link="fulltext">11872479</pubid>
                  <pubid idtype="doi">10.1128/AEM.68.3.1290-1296.2002</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Marine T4-type bacteriophages, a ubiquitous component of the dark matter of the biosphere.</p>
            </title>
            <aug>
               <au>
                  <snm>Fil&#233;e</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>T&#233;tart</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Suttle</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Krisch</snm>
                  <fnm>HM</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2005</pubdate>
            <volume>102</volume>
            <fpage>12471</fpage>
            <lpage>12476</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1194919</pubid>
                  <pubid idtype="pmpid" link="fulltext">16116082</pubid>
                  <pubid idtype="doi">10.1073/pnas.0503404102</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>High diversity of unknown picorna-like viruses in the sea.</p>
            </title>
            <aug>
               <au>
                  <snm>Culley</snm>
                  <fnm>AI</fnm>
               </au>
               <au>
                  <snm>Lang</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Suttle</snm>
                  <fnm>CA</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2003</pubdate>
            <volume>424</volume>
            <fpage>1054</fpage>
            <lpage>1057</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature01886</pubid>
                  <pubid idtype="pmpid" link="fulltext">12944967</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>New genera of RNA viruses in subtropical seawater, inferred from polymerase gene sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Culley</snm>
                  <fnm>AI</fnm>
               </au>
               <au>
                  <snm>Steward</snm>
                  <fnm>GF</fnm>
               </au>
            </aug>
            <source>Appl Environ Microbiol</source>
            <pubdate>2007</pubdate>
            <volume>73</volume>
            <fpage>5937</fpage>
            <lpage>5944</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2074930</pubid>
                  <pubid idtype="pmpid" link="fulltext">17644642</pubid>
                  <pubid idtype="doi">10.1128/AEM.01065-07</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Compilation, alignment, and phylogenetic relationships of DNA polymerases.</p>
            </title>
            <aug>
               <au>
                  <snm>Braithwaite</snm>
                  <fnm>DK</fnm>
               </au>
               <au>
                  <snm>Ito</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1993</pubdate>
            <volume>21</volume>
            <fpage>787</fpage>
            <lpage>802</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">309208</pubid>
                  <pubid idtype="pmpid" link="fulltext">8451181</pubid>
                  <pubid idtype="doi">10.1093/nar/21.4.787</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Evolution of DNA polymerase families: evidences for multiple gene exchange between cellular and viral proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Fil&#233;e</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Forterre</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Sen-Lin</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Laurent</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>2002</pubdate>
            <volume>54</volume>
            <fpage>763</fpage>
            <lpage>773</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s00239-001-0078-x</pubid>
                  <pubid idtype="pmpid" link="fulltext">12029358</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Common origin of four diverse families of large eukaryotic DNA viruses.</p>
            </title>
            <aug>
               <au>
                  <snm>Iyer</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Aravind</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>J Virol</source>
            <pubdate>2001</pubdate>
            <volume>75</volume>
            <fpage>11720</fpage>
            <lpage>11734</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">114758</pubid>
                  <pubid idtype="pmpid" link="fulltext">11689653</pubid>
                  <pubid idtype="doi">10.1128/JVI.75.23.11720-11734.2001</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Report from the 36th and the 37th meetings of the Executive Committee of the International Committee on Taxonomy of Viruses.</p>
            </title>
            <aug>
               <au>
                  <snm>Mayo</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Haenni</snm>
                  <fnm>AL</fnm>
               </au>
            </aug>
            <source>Arch Virol</source>
            <pubdate>2006</pubdate>
            <volume>151</volume>
            <fpage>1031</fpage>
            <lpage>1037</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s00705-006-0728-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">16514500</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Genetic diversity in marine algal virus communities as revealed by sequence analysis of DNA polymerase genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Suttle</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Short</snm>
                  <fnm>SM</fnm>
               </au>
            </aug>
            <source>Appl Environ Microbiol</source>
            <pubdate>1996</pubdate>
            <volume>62</volume>
            <fpage>2869</fpage>
            <lpage>2874</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">168073</pubid>
                  <pubid idtype="pmpid" link="fulltext">8702280</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Use of microarrays to assess viral diversity: from genotype to phenotype.</p>
            </title>
            <aug>
               <au>
                  <snm>Allen</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Martinez-Martinez</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Schroeder</snm>
                  <fnm>DC</fnm>
               </au>
               <au>
                  <snm>Somerfield</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Wilson</snm>
                  <fnm>WH</fnm>
               </au>
            </aug>
            <source>Environ Microbiol</source>
            <pubdate>2007</pubdate>
            <volume>9</volume>
            <fpage>971</fpage>
            <lpage>982</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1462-2920.2006.01219.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">17359269</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>A hypothesis for DNA viruses as the origin of eukaryotic replication proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Villarreal</snm>
                  <fnm>LP</fnm>
               </au>
               <au>
                  <snm>DeFilippis</snm>
                  <fnm>VR</fnm>
               </au>
            </aug>
            <source>J Virol</source>
            <pubdate>2000</pubdate>
            <volume>74</volume>
            <fpage>7079</fpage>
            <lpage>7084</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">112226</pubid>
                  <pubid idtype="pmpid" link="fulltext">10888648</pubid>
                  <pubid idtype="doi">10.1128/JVI.74.15.7079-7084.2000</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Metagenomic analysis of coastal RNA virus communities.</p>
            </title>
            <aug>
               <au>
                  <snm>Culley</snm>
                  <fnm>AI</fnm>
               </au>
               <au>
                  <snm>Lang</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Suttle</snm>
                  <fnm>CA</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2006</pubdate>
            <volume>312</volume>
            <fpage>1795</fpage>
            <lpage>1798</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1127404</pubid>
                  <pubid idtype="pmpid" link="fulltext">16794078</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>MOLPHY version 2.3: programs for molecular phylogenetics based on maximum likelihood.</p>
            </title>
            <aug>
               <au>
                  <snm>Adachi</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hasegawa</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Computer Science Monographs</source>
            <publisher>Tokyo: Institue of Statistical Mathematics</publisher>
            <pubdate>1996</pubdate>
            <volume>28</volume>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Maximum likelihood inference of protein phylogeny and the origin of chloroplasts.</p>
            </title>
            <aug>
               <au>
                  <snm>Kishino</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Miyata</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Hasegawa</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1990</pubdate>
            <volume>31</volume>
            <fpage>151</fpage>
            <lpage>160</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1007/BF02109483</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Very fast algorithms for evaluating the stability of ML and Bayesian phylogenetic treesfrom sequence data.</p>
            </title>
            <aug>
               <au>
                  <snm>Waddell</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Kishino</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ota</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Genome Inform</source>
            <pubdate>2002</pubdate>
            <volume>13</volume>
            <fpage>82</fpage>
            <lpage>92</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">14571377</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Comparison of the genome sequences of non-pathogenic and pathogenic African swine fever virus isolates.</p>
            </title>
            <aug>
               <au>
                  <snm>Chapman</snm>
                  <fnm>DAG</fnm>
               </au>
               <au>
                  <snm>Tcherepanov</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Upton</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Dixon</snm>
                  <fnm>LK</fnm>
               </au>
            </aug>
            <source>J Gen Virol</source>
            <pubdate>2008</pubdate>
            <volume>89</volume>
            <fpage>397</fpage>
            <lpage>408</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1099/vir.0.83343-0</pubid>
                  <pubid idtype="pmpid" link="fulltext">18198370</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Life-cycle and genome of OtV5, a large DNA virus of the pelagic marine unicellular green alga <it>Ostreococcus tauri</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Derelle</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Ferraz</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Escande</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Eycheni&#233;</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Cooke</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Piganeau</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Desdevises</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Bellec</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Moreau</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Grimsley</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>PLoS ONE</source>
            <pubdate>2008</pubdate>
            <volume>3</volume>
            <fpage>e2250</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2386258</pubid>
                  <pubid idtype="pmpid" link="fulltext">18509524</pubid>
                  <pubid idtype="doi">10.1371/journal.pone.0002250</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Global phage diversity.</p>
            </title>
            <aug>
               <au>
                  <snm>Rohwer</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2003</pubdate>
            <volume>113</volume>
            <fpage>141</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0092-8674(03)00276-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">12705861</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>A diverse superfamily of enzymes with ATP-dependent carboxylate-amine/thiol ligase activity.</p>
            </title>
            <aug>
               <au>
                  <snm>Galperin</snm>
                  <fnm>MY</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Protein Sci</source>
            <pubdate>1997</pubdate>
            <volume>6</volume>
            <fpage>2639</fpage>
            <lpage>2643</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2143612</pubid>
                  <pubid idtype="pmpid" link="fulltext">9416615</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Phylogenetic analysis of members of the Phycodnaviridae virus family, using amplified fragments of the major capsid protein gene.</p>
            </title>
            <aug>
               <au>
                  <snm>Larsen</snm>
                  <fnm>JB</fnm>
               </au>
               <au>
                  <snm>Larsen</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bratbak</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Sandaa</snm>
                  <fnm>RA</fnm>
               </au>
            </aug>
            <source>Appl Environ Microbiol</source>
            <pubdate>2008</pubdate>
            <volume>74</volume>
            <fpage>3048</fpage>
            <lpage>3057</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1128/AEM.02548-07</pubid>
                  <pubid idtype="pmpid" link="fulltext">18359826</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Isolation and characterization of two viruses with large genome size infecting <it>Chrysochromulina ericina </it>(Prymnesiophyceae) and <it>Pyramimonas orientalis </it>(Prasinophyceae).</p>
            </title>
            <aug>
               <au>
                  <snm>Sandaa</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Heldal</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Castberg</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Thyrhaug</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Bratbak</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Virology</source>
            <pubdate>2001</pubdate>
            <volume>290</volume>
            <fpage>272</fpage>
            <lpage>280</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/viro.2001.1161</pubid>
                  <pubid idtype="pmpid" link="fulltext">11883191</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>The marine algal virus PpV01 has an icosahedral capsid with T = 219 quasisymmetry.</p>
            </title>
            <aug>
               <au>
                  <snm>Yan</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Chipman</snm>
                  <fnm>PR</fnm>
               </au>
               <au>
                  <snm>Castberg</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Bratbak</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Baker</snm>
                  <fnm>TS</fnm>
               </au>
            </aug>
            <source>J Virol</source>
            <pubdate>2005</pubdate>
            <volume>79</volume>
            <fpage>9236</fpage>
            <lpage>9243</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1168743</pubid>
                  <pubid idtype="pmpid" link="fulltext">15994818</pubid>
                  <pubid idtype="doi">10.1128/JVI.79.14.9236-9243.2005</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Chlorella viruses.</p>
            </title>
            <aug>
               <au>
                  <snm>Yamada</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Onimatsu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Etten</snm>
                  <fnm>JLV</fnm>
               </au>
            </aug>
            <source>Adv Virus Res</source>
            <volume>66</volume>
            <fpage>293</fpage>
            <lpage>336</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">16877063</pubid>
                  <pubid idtype="pmcid">1955756</pubid>
                  <pubid idtype="doi">10.1016/S0065-3527(06)66006-5</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Complete genome sequence and lytic phase transcription profile of a Coccolithovirus.</p>
            </title>
            <aug>
               <au>
                  <snm>Wilson</snm>
                  <fnm>WH</fnm>
               </au>
               <au>
                  <snm>Schroeder</snm>
                  <fnm>DC</fnm>
               </au>
               <au>
                  <snm>Allen</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Holden</snm>
                  <fnm>MTG</fnm>
               </au>
               <au>
                  <snm>Parkhill</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Barrell</snm>
                  <fnm>BG</fnm>
               </au>
               <au>
                  <snm>Churcher</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Hamlin</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Mungall</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Norbertczak</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Quail</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Price</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Rabbinowitsch</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Walker</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Craigon</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Roy</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ghazal</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Science</source>
            <volume>309</volume>
            <fpage>1090</fpage>
            <lpage>1092</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">16099989</pubid>
                  <pubid idtype="doi">10.1126/science.1113109</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>The tiny eukaryote Ostreococcus provides genomic insights into the paradox of plankton speciation.</p>
            </title>
            <aug>
               <au>
                  <snm>Palenik</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Grimwood</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Aerts</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Rouz&#233;</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Salamov</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Putnam</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Dupont</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Jorgensen</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Derelle</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Rombauts</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Otillar</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Merchant</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Podell</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gaasterland</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Napoli</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Gendler</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Manuell</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Tai</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Vallon</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Piganeau</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Jancek</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Heijde</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Jabbari</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Bowler</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lohr</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Robbens</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Werner</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Dubchak</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Pazour</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Ren</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Paulsen</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Delwiche</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Schmutz</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Rokhsar</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>de Peer</snm>
                  <fnm>YV</fnm>
               </au>
               <au>
                  <snm>Moreau</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Grigoriev</snm>
                  <fnm>IV</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2007</pubdate>
            <volume>104</volume>
            <fpage>7705</fpage>
            <lpage>7710</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1863510</pubid>
                  <pubid idtype="pmpid" link="fulltext">17460045</pubid>
                  <pubid idtype="doi">10.1073/pnas.0611046104</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Viral communities associated with healthy and bleaching corals.</p>
            </title>
            <aug>
               <au>
                  <snm>Marhaver</snm>
                  <fnm>KL</fnm>
               </au>
               <au>
                  <snm>Edwards</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Rohwer</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Environ Microbiol</source>
            <inpress/>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">18479440</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Surface phytoplankton pigment disributions in the Atlantic Ocean: an assessment of basin scale between 50&#176;N and 50&#176;S.</p>
            </title>
            <aug>
               <au>
                  <snm>Gibb</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Barlow</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Cummings</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Rees</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Trees</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Holligan</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Suggett</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Prog Oceanography</source>
            <pubdate>2000</pubdate>
            <volume>45</volume>
            <fpage>368</fpage>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Picoeukaryotic diversity in an oligotrophic coastal site studied by molecular and culturing approaches.</p>
            </title>
            <aug>
               <au>
                  <snm>Massana</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Balagu&#233;</snm>
                  <fnm/>
               </au>
               <au>
                  <snm>Guillou</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Pedr&#243;s-Ali&#243;</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>FEMS Microbiol Ecol</source>
            <pubdate>2004</pubdate>
            <volume>3</volume>
            <fpage>231</fpage>
            <lpage>243</lpage>
         </bibl>
         <bibl id="B51">
            <title>
               <p>Blooms of <it>Emiliania huxleyi </it>(Prymnesiophyceae) in surface waters of the Nova Scotian Shelf and the Grand Bank.</p>
            </title>
            <aug>
               <au>
                  <snm>Brown</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Yoder</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Plankton Res</source>
            <volume>15</volume>
            <fpage>1438</fpage>
         </bibl>
         <bibl id="B52">
            <title>
               <p>Coccolithophore dynamics off Bermuda (N. Atlantic).</p>
            </title>
            <aug>
               <au>
                  <snm>Haidar</snm>
                  <fnm>AT</fnm>
               </au>
               <au>
                  <snm>Thierstein</snm>
                  <fnm>HR</fnm>
               </au>
            </aug>
            <source>Deep Sea Res II</source>
            <pubdate>2001</pubdate>
            <volume>48</volume>
            <fpage>1925</fpage>
            <lpage>1956</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/S0967-0645(00)00169-7</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>Acanthamoeba: biology and increasing importance in human health.</p>
            </title>
            <aug>
               <au>
                  <snm>Khan</snm>
                  <fnm>NA</fnm>
               </au>
            </aug>
            <source>FEMS Microbiol Rev</source>
            <pubdate>2006</pubdate>
            <volume>30</volume>
            <fpage>564</fpage>
            <lpage>595</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1574-6976.2006.00023.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">16774587</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>Characterization of the gene rimK responsible for the addition of glutamic acid residues to the C-terminus of ribosomal protein S6 in <it>Escherichia coli </it>K12.</p>
            </title>
            <aug>
               <au>
                  <snm>Kang</snm>
                  <fnm>WK</fnm>
               </au>
               <au>
                  <snm>Icho</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Isono</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kitakawa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Isono</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Mol Gen Genet</source>
            <pubdate>1989</pubdate>
            <volume>217</volume>
            <fpage>281</fpage>
            <lpage>288</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF02464894</pubid>
                  <pubid idtype="pmpid">2570347</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>Protein-chemical studies on <it>Escherichia coli </it>mutants with altered ribosomal proteins S6 and S7.</p>
            </title>
            <aug>
               <au>
                  <snm>Kade</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Dabbs</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Wittmann-Liebold</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>FEBS Lett</source>
            <pubdate>1980</pubdate>
            <volume>121</volume>
            <fpage>313</fpage>
            <lpage>316</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/0014-5793(80)80371-1</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B56">
            <title>
               <p>Post-translational modification of <it>Escherichia coli </it>ribosomal protein S6.</p>
            </title>
            <aug>
               <au>
                  <snm>Reeh</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Pedersen</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Mol Gen Genet</source>
            <pubdate>1979</pubdate>
            <volume>173</volume>
            <fpage>183</fpage>
            <lpage>187</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF00330309</pubid>
                  <pubid idtype="pmpid">386035</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B57">
            <title>
               <p>Phosphorylation of elongation factor G and ribosomal protein S6 in bacteriophage T7-infected <it>Escherichia coli</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Robertson</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Aggison</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Nicholson</snm>
                  <fnm>AW</fnm>
               </au>
            </aug>
            <source>Mol Microbiol</source>
            <pubdate>1994</pubdate>
            <volume>11</volume>
            <fpage>1045</fpage>
            <lpage>1057</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1365-2958.1994.tb00382.x</pubid>
                  <pubid idtype="pmpid">8022276</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B58">
            <title>
               <p>Ma-LMM01 infecting toxic Microcystis aeruginosa illuminates diverse cyanophage genome strategies.</p>
            </title>
            <aug>
               <au>
                  <snm>Yoshida</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Nagasaki</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Takashima</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Shirai</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Tomaru</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Takao</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Sakamoto</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hiroishi</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ogata</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>2008</pubdate>
            <volume>190</volume>
            <fpage>1762</fpage>
            <lpage>1772</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2258655</pubid>
                  <pubid idtype="pmpid" link="fulltext">18065537</pubid>
                  <pubid idtype="doi">10.1128/JB.01534-07</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B59">
            <title>
               <p>Photosynthesis genes in marine viruses yield proteins during host infection.</p>
            </title>
            <aug>
               <au>
                  <snm>Lindell</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Jaffe</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Johnson</snm>
                  <fnm>ZI</fnm>
               </au>
               <au>
                  <snm>Church</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Chisholm</snm>
                  <fnm>SW</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2005</pubdate>
            <volume>438</volume>
            <fpage>86</fpage>
            <lpage>89</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature04111</pubid>
                  <pubid idtype="pmpid" link="fulltext">16222247</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B60">
            <title>
               <p>CAMERA: a community resource for metagenomics.</p>
            </title>
            <aug>
               <au>
                  <snm>Seshadri</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kravitz</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Smarr</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Gilna</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Frazier</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2007</pubdate>
            <volume>5</volume>
            <fpage>e75</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1821059</pubid>
                  <pubid idtype="pmpid" link="fulltext">17355175</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0050075</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B61">
            <title>
               <p>EMBOSS: the European Molecular Biology Open Software Suite.</p>
            </title>
            <aug>
               <au>
                  <snm>Rice</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Longden</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Bleasby</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>276</fpage>
            <lpage>277</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(00)02024-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">10827456</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B62">
            <title>
               <p>Pfam: clans, web tools and services.</p>
            </title>
            <aug>
               <au>
                  <snm>Finn</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Mistry</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Schuster-Bockler</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Griffths-Jones</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hollich</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Lassmann</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Moxon</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Marshall</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Khanna</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Sonnhammer</snm>
                  <fnm>ELL</fnm>
               </au>
               <au>
                  <snm>Bateman</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>D247</fpage>
            <lpage>D251</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1347511</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381856</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj149</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B63">
            <title>
               <p>Profile hidden Markov models.</p>
            </title>
            <aug>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>1998</pubdate>
            <volume>14</volume>
            <fpage>755</fpage>
            <lpage>763</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/14.9.755</pubid>
                  <pubid idtype="pmpid" link="fulltext">9918945</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B64">
            <title>
               <p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Schaffer</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>3389</fpage>
            <lpage>3402</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146917</pubid>
                  <pubid idtype="pmpid" link="fulltext">9254694</pubid>
                  <pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B65">
            <title>
               <p>The TIGRFAMs database of protein families.</p>
            </title>
            <aug>
               <au>
                  <snm>Haft</snm>
                  <fnm>DH</fnm>
               </au>
               <au>
                  <snm>Selengut</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>371</fpage>
            <lpage>373</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">165575</pubid>
                  <pubid idtype="pmpid" link="fulltext">12520025</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg128</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B66">
            <title>
               <p>NCBI reference sequences(RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Pruitt</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Tatusova</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Maglott</snm>
                  <fnm>DR</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2007</pubdate>
            <volume>35</volume>
            <fpage>D61</fpage>
            <lpage>D65</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1716718</pubid>
                  <pubid idtype="pmpid" link="fulltext">17130148</pubid>
                  <pubid idtype="doi">10.1093/nar/gkl842</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B67">
            <title>
               <p>From genomics to chemical genomics: new developments in KEGG.</p>
            </title>
            <aug>
               <au>
                  <snm>Kanehisa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Goto</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hattori</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Aoki-Kinoshita</snm>
                  <fnm>KF</fnm>
               </au>
               <au>
                  <snm>Itoh</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kawashima</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Katayama</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Araki</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hirakawa</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>D354</fpage>
            <lpage>D357</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1347464</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381885</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj102</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B68">
            <title>
               <p>A new example of viral intein in Mimivirus.</p>
            </title>
            <aug>
               <au>
                  <snm>Ogata</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Raoult</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Claverie</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Virol J</source>
            <pubdate>2005</pubdate>
            <volume>2</volume>
            <fpage>8</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">549080</pubid>
                  <pubid idtype="pmpid" link="fulltext">15707490</pubid>
                  <pubid idtype="doi">10.1186/1743-422X-2-8</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B69">
            <title>
               <p>Algal viruses with distinct intraspecies host specificities include identical intein elements.</p>
            </title>
            <aug>
               <au>
                  <snm>Nagasaki</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Shirai</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Tomaru</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Nishida</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Pietrokovski</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Appl Environ Microbiol</source>
            <pubdate>2005</pubdate>
            <volume>71</volume>
            <fpage>3599</fpage>
            <lpage>3607</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1169056</pubid>
                  <pubid idtype="pmpid" link="fulltext">16000767</pubid>
                  <pubid idtype="doi">10.1128/AEM.71.7.3599-3607.2005</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B70">
            <title>
               <p>T-Coffee: A novel method for fast and accurate multiple sequence alignment.</p>
            </title>
            <aug>
               <au>
                  <snm>Notredame</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Heringa</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2000</pubdate>
            <volume>302</volume>
            <fpage>205</fpage>
            <lpage>217</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2000.4042</pubid>
                  <pubid idtype="pmpid" link="fulltext">10964570</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B71">
            <title>
               <p>The M-Coffee web server: a meta-method for computing multiple sequence alignments by combining alternative alignment methods.</p>
            </title>
            <aug>
               <au>
                  <snm>Moretti</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Armougom</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Wallace</snm>
                  <fnm>IM</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Jongeneel</snm>
                  <fnm>CV</fnm>
               </au>
               <au>
                  <snm>Notredame</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2007</pubdate>
            <volume>35</volume>
            <fpage>W645</fpage>
            <lpage>W648</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1933118</pubid>
                  <pubid idtype="pmpid" link="fulltext">17526519</pubid>
                  <pubid idtype="doi">10.1093/nar/gkm333</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B72">
            <title>
               <p>M-Coffee: combining multiple sequence alignment methods with T-Coffee.</p>
            </title>
            <aug>
               <au>
                  <snm>Wallace</snm>
                  <fnm>IM</fnm>
               </au>
               <au>
                  <snm>O'Sullivan</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Notredame</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>1692</fpage>
            <lpage>1699</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1410914</pubid>
                  <pubid idtype="pmpid" link="fulltext">16556910</pubid>
                  <pubid idtype="doi">10.1093/nar/gkl091</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B73">
            <title>
               <p>A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood.</p>
            </title>
            <aug>
               <au>
                  <snm>Guindon</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gascuel</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Syst Biol</source>
            <pubdate>2003</pubdate>
            <volume>52</volume>
            <fpage>696</fpage>
            <lpage>704</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1080/10635150390235520</pubid>
                  <pubid idtype="pmpid">14530136</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B74">
            <title>
               <p>Phylogeny.fr: robust phylogenetic analysis for the non-specialist.</p>
            </title>
            <aug>
               <au>
                  <snm>Dereeper</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Guignon</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Blanc</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Audic</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Buffet</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Chevenet</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Dufayard</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Guindon</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lefort</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Lescot</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Claverie</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Gascuel</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2008</pubdate>
            <issue>36 Web Server</issue>
            <fpage>W465</fpage>
            <lpage>469</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2447785</pubid>
                  <pubid idtype="pmpid" link="fulltext">18424797</pubid>
                  <pubid idtype="doi">10.1093/nar/gkn180</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B75">
            <title>
               <p>BAOBAB: a Java editor for large phylogenetic trees.</p>
            </title>
            <aug>
               <au>
                  <snm>Dutheil</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Galtier</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <fpage>892</fpage>
            <lpage>893</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/18.6.892</pubid>
                  <pubid idtype="pmpid" link="fulltext">12075029</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B76">
            <title>
               <p>A mutation data matrix for transmembrane proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Jones</snm>
                  <fnm>DT</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>WR</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>FEBS Lett</source>
            <pubdate>1994</pubdate>
            <volume>339</volume>
            <fpage>269</fpage>
            <lpage>275</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0014-5793(94)80429-X</pubid>
                  <pubid idtype="pmpid" link="fulltext">8112466</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B77">
            <title>
               <p>MUSCLE: a multiple sequence alignment method with reduced time and space complexity.</p>
            </title>
            <aug>
               <au>
                  <snm>Edgar</snm>
                  <fnm>RC</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>113</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">517706</pubid>
                  <pubid idtype="pmpid" link="fulltext">15318951</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-5-113</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B78">
            <title>
               <p>The Universal Protein Resource (UniProt): an expanding universe of protein information.</p>
            </title>
            <aug>
               <au>
                  <snm>Wu</snm>
                  <fnm>CH</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Natale</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Barker</snm>
                  <fnm>WC</fnm>
               </au>
               <au>
                  <snm>Boeckmann</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Ferro</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gasteiger</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Lopez</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Magrane</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Martin</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Mazumder</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>O'Donovan</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Redaschi</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Suzek</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>D187</fpage>
            <lpage>D191</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1347523</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381842</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj161</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
