<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-9-34</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>A statistical toolbox for metagenomics: assessing functional diversity in microbial communities</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Schloss</snm>
               <mi>D</mi>
               <fnm>Patrick</fnm>
               <insr iid="I1"/>
               <email>pschloss@microbio.umass.edu</email>
            </au>
            <au id="A2">
               <snm>Handelsman</snm>
               <fnm>Jo</fnm>
               <insr iid="I2"/>
               <email>joh@bact.wisc.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Microbiology, University of Massachusetts &#8211; Amherst, Amherst, MA 01003, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Bacteriology, University of Wisconsin &#8211; Madison, Madison, WI 53706, USA</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2008</pubdate>
         <volume>9</volume>
         <issue>1</issue>
         <fpage>34</fpage>
         <url>http://www.biomedcentral.com/1471-2105/9/34</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18215273</pubid>
               <pubid idtype="doi">10.1186/1471-2105-9-34</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>08</day>
               <month>5</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>23</day>
               <month>1</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>23</day>
               <month>1</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Schloss and Handelsman; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>The 99% of bacteria in the environment that are recalcitrant to culturing have spurred the development of metagenomics, a culture-independent approach to sample and characterize microbial genomes. Massive datasets of metagenomic sequences have been accumulated, but analysis of these sequences has focused primarily on the descriptive comparison of the relative abundance of proteins that belong to specific functional categories. More robust statistical methods are needed to make inferences from metagenomic data. In this study, we developed and applied a suite of tools to describe and compare the richness, membership, and structure of microbial communities using peptide fragment sequences extracted from metagenomic sequence data.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Application of these tools to acid mine drainage, soil, and whale fall metagenomic sequence collections revealed groups of peptide fragments with a relatively high abundance and no known function. When combined with analysis of 16S rRNA gene fragments from the same communities these tools enabled us to demonstrate that although there was no overlap in the types of 16S rRNA gene sequence observed, there was a core collection of operational protein families that was shared among the three environments.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>The results of comparisons between the three habitats were surprising considering the relatively low overlap of membership and the distinctively different characteristics of the three habitats. These tools will facilitate the use of metagenomics to pursue statistically sound genome-based ecological analyses.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Metagenomics, the culture-independent isolation and characterization of DNA from uncultured microorganisms <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, has facilitated the analysis of the functional biodiversity harbored in the large reservoir of uncultured bacteria and archaea <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. Although early metagenomic studies identified individual genes or activities of interest, recent advances in genome sequencing technologies have made obtaining a complete metagenomic sequence more tractable. Sequence-based approaches combined with functional expression approaches have the potential to identify novel genes important for industrial and ecological applications. Sequence-based approaches have recently been applied to DNA obtained from viruses <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>, seawater <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>, wastewater <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp>, sediment <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, sponges <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, acid mine drainage <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, marine worms <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, human gut <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, soil <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, and decomposing whale carcasses <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. The analysis used to describe these communities has primarily focused on the descriptive characterization and comparison of the relative abundance of proteins that belong to specific functional categories.</p>
         <p>Attempts to analyze metagenomic sequences have proven that a metagenomic sequence is more than just a large genome sequencing project. First, the goal of most genome sequence projects is a closed genome sequence where every nucleotide is represented by a desired number of independent sequence reads. In metagenomics, the probability of finding overlapping sequence reads is low in most environments <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. The probability that overlapping sequence reads are from the same population of bacteria or archaea is even lower so that contigs that are formed are out of necessity chimeras of different genomes that may not even be from the same phylum <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. Second, a closed genome represents a statistical population of the genes harbored by that organism; therefore, comparing genome sequences for the presence or absence of genes is straightforward. Since it is not possible to close a metagenome, every metagenomic sequence collection represents a statistical sample of the genomes in an environment. Therefore, it is necessary to treat the comparison of communities as a statistical problem. Third, although lab-based cultures that are sequenced do evolve, the differences between lab stocks is minimal compared to the changes faced by natural communities over short periods of time. This makes it difficult to reanalyze a community once a genome sequence has been obtained to improve annotations and understand gene expression.</p>
         <p>Five general approaches have been taken to bring statistical analysis to the analysis of metagenomic sequences. The first adapts genomics-based approaches to metagenomics by constructing and curating databases to aid in the annotation and analysis of genes and the contigs they reside on <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>. Unfortunately, although such databases provide a critical infrastructure, given the large number of ORFs that have no known function (e.g. 69% in the Sargasso Sea <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>) and the paucity of contigs formed from many sequencing projects (e.g. &lt;1% in the soil <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>), such database searches will be of limited value for comparative metagenomics. The second approach to analyzing metagenomic sequences has been based on the comparison on the relative abundance of annotation categories within the different sequence collections and within databases of assembled genomes <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B23">23</abbr></abbrgrp>; these methods implicitly assume that the metagenomic sequences represent a statistical population and/or that the reference databases represent the normal distribution of genes in communities. A third set of approaches attempts to assign a phylogenetic origin for a sequence fragment in the absence of a phylogenetic anchor (e.g. 16S rRNA gene) using nucleotide frequency analysis or sequence signatures <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp>. Such methods are limited for use with most environments because of the difficulty in forming contigs that are long enough to carry out a robust analysis and assume that the contigs that form are not chimeric. A fourth approach has attempted to compare communities without an annotation. These have attempted to quantify the species richness of communities based on the distribution of sequence read depth among contigs <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> and to compare the diversity of communities based on the relative frequency of different length oligonucleotides in the DNA sequence pool <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. Finally, there have been attempts to using traditional population biology by analyzing the diversity of specific families of genes found in metagenomic collections <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>.</p>
         <p>Based on previous metagenomic sequencing efforts, we were interested in developing statistical tools to compare the richness, membership, and structure of the complement of ORFs from multiple communities in which assembly of the entire genomes is not possible. To address this problem, we adapted a set of statistical tools designed to analyze collections of 16S rRNA gene sequences to the analysis of protein coding genes <abbrgrp><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr></abbrgrp>. Our goal was to provide additional tools to make statistical and ecological inferences using metagenomic sequence data. Instead of using a traditional pairwise DNA distance matrix obtained from a sequence alignment of homologous genes as is done with 16S rRNA genes, we used BLAST score ratios (BSRs) to develop a distance matrix that represents the similarity of ORFs across homologous groups <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. To make comparisons among communities, we propose grouping ORFs into operational protein families (OPFs) which are analogous to operational taxonomic units (OTUs) derived from 16S rRNA gene sequences.</p>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <sec>
            <st>
               <p>A new distance matrix</p>
            </st>
            <p>The goal of this aspect of the work was to develop a method to compare sequence alignments that circumvented the considerable computational effort required to obtain every possible global sequence alignment and pairwise distance. We used local alignments provided in BLAST and the resulting pairwise BLAST scores to generate BSRs. The BSRs approximate the fraction of identical amino acids between two peptide fragments so that a BSR value of 0.30 between two fragments means that they are approximately 30% identical over their full length. By analogy to the analysis of 16S rRNA gene sequences of uncultured bacteria where OTUs are developed based on a distance matrix, we propose using BSR values to define OPFs. Depending on the goals of the analysis an OPF can be defined as necessary. For illustrative purposes and based on previous implementations of BSRs for comparative genomics applications <abbrgrp><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr></abbrgrp>, unless otherwise indicated we will operationally define an OPF as a collection of fragments that have a BSR greater than 0.30.</p>
            <p>To assess the feasibility of using peptide fragments from individual sequence reads, we identified peptide fragments from the individual sequence reads used to assemble the <it>Bacillus anthracis</it>, str. Ames genome (GenBank Accession <ext-link ext-link-type="gen" ext-link-id="NC_003997">NC_003997</ext-link>, <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>), which contains 4,514 ORFs that were longer than 100 aa. From the individual sequence reads, we identified 92,220 peptide fragments longer than 100 aa. The computational effort required for the pairwise alignment and distance calculation among 92,220 ORFs was prohibitive. Because we expected a majority of the peptide pairs would not have significant similarity, we used BLAST to identify those comparisons that had significant similarity and to calculate BSRs as a surrogate for similarity or distance values (distance = 1-BSR). Instead of generating a 92,220 &#215; 92,220 matrix with 8.5 &#215; 10<sup>9 </sup>values, we took advantage of the sparseness of the matrix to simplify the calculations and construct a set of three linked-lists in which each list contained the row, column, and BSR values of the full BSR matrix. Since the BSR for a peptide fragment compared to itself is 1.0 and the BSR for a non-significant comparison is 0.0, the corresponding entries in the linked lists could be removed. Once this was completed, there were 2.1 &#215; 10<sup>6 </sup>values, which represented a significant reduction in the memory required to store the data.</p>
         </sec>
         <sec>
            <st>
               <p>MG-DOTUR</p>
            </st>
            <p>To assign peptide fragments to OPFs we rewrote the computer code for DOTUR to be compatible with sparse BSR matrices. DOTUR is used to assign collections of 16S rRNA gene sequences and to use the resulting frequency distribution of sequences among OTUs to estimate richness and diversity (Table <tblr tid="T1">1</tblr>). By analogy, MG-DOTUR assigns peptide fragments to OPFs and estimates the richness and diversity of OPFs for any desired OPF definition. Two classes of methods are available to estimate richness based on frequency distributions. The first uses parametric distributions such as the lognormal distribution to predict the number of unseen groups in a community <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. Although it is often assumed that microbial communities follow a lognormal distribution, there are no published examples in the microbial ecology literature for which the observed data support such an assumption. This is primarily due to the difficulty in obtaining a sufficient number of observations to implement these methods. An alternative approach uses non-parametric estimators that do not assume an underlying frequency distribution and are relatively easy to compute. These estimators are implemented in DOTUR and MG-DOTUR.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Tools used to describe and compare microbial communities.</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="center">
                        <p>
                           <b>Tool</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Application</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Input</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Reference</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>DOTUR/MG-DOTUR</p>
                     </c>
                     <c ca="left">
                        <p>Assigns sequences to OTUs based on genetic distance between sequences and constructs rarefaction curves and collector's curves for richness and diversity estimators</p>
                     </c>
                     <c ca="center">
                        <p>Distance Matrix or BLAST Table</p>
                     </c>
                     <c ca="center">
                        <p>[30]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>SONS</p>
                     </c>
                     <c ca="left">
                        <p>Generates collector's curves for estimates of the fraction and richness of OTUs shared between communities</p>
                     </c>
                     <c ca="center">
                        <p>OTU Designation</p>
                     </c>
                     <c ca="center">
                        <p>[56]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>&#8747;-LIBSHUFF/MG-LIBSHUFF</p>
                     </c>
                     <c ca="left">
                        <p>Tests whether the structures of two communities are the same, different, or subsets of one another using the Cramer-von Mises statistic</p>
                     </c>
                     <c ca="center">
                        <p>Distance Matrix or BLAST Table</p>
                     </c>
                     <c ca="center">
                        <p>[31, 32]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>AMOVA/MG-AMOVA</p>
                     </c>
                     <c ca="left">
                        <p>Determines whether two or more communities differ significantly in genetic diversity using an analysis of variance-type formulation</p>
                     </c>
                     <c ca="center">
                        <p>Distance Matrix or BLAST Table</p>
                     </c>
                     <c ca="center">
                        <p>[33, 47, 48]</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>Based on the observed frequency distribution of peptide fragments in each OPF<sub>0.30</sub>, we applied the Chao1, ACE, and interpolated Jackknife richness estimators to predict the OPF<sub>0.30 </sub>richness. The predicted OPF richness was approximately three times greater than the OPF richness that was observed in the assembled <it>B. anthracis </it>genome (Table <tblr tid="T1">1</tblr>). When we mapped each OPF from the closed genome to the OPFs from the individual sequence reads we found that each OPF from the closed genome was linked to an average of 3.08 (s.d. = 2.75) OPFs from the sequence reads. Further inspection showed that the multiple OPFs from the sequence reads corresponded to different regions of long ORFs from the closed genome sequence. Similar results have been observed when attempting to estimate the number of expressed genes using expressed sequence tags <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>.</p>
            <p>To overcome this problem, we developed a method of merging OPFs from the sequence reads to obtain a more meaningful OPF distribution. For two OPFs to merge, we required that the carboxyl-terminus of at least one sequence in the first OPF overlap with the amino-terminus of at least one sequence in the second OPF by at least 5 amino acids. Furthermore, we incorporated a BSR penalty so that for two OPFs to merge the overlapping region had to have a BSR greater than the BSR currently being used to form clusters. We used penalties of 0.00, 0.05, 0.10, 0.15, and 0.20 (Table <tblr tid="T1">1</tblr>). We then applied this merging scheme to the OPFs from the sequence reads and calculated two types of error <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. Type I errors corresponded to the fraction of OPFs from the closed genome that mapped to multiple OPFs from the sequence reads. Type II errors corresponded to the fraction of OPFs from the sequence reads that corresponded to different OPFs from the closed genome (Table <tblr tid="T2">2</tblr>). We found that as we increased the penalty, the Type I error decreased and the Type II error increased. Based on this analysis, we decided to implement a penalty of 0.15 because both types of error were 7.1 and 7.4%, respectively. When the resulting frequency distribution was used to calculate collector's curves using the observed and predicted richness, the curves converged towards the true OPF richness (Fig. <figr fid="F1">1A</figr>). This was used to further validate the choice of penalty. A limitation of this approach is that the resulting number of peptide fragments in a merged OPF is a product of the length of the complete ORF and the relative abundance of the ORF in the metagenome. Therefore, we will report OPF richness from merged analysis and annotations from both merged and non-merged analyses.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Analysis of the richness and community membership when peptide fragments identified in individual sequence reads were used to assemble the <it>Bacillus anthracis </it>str</p>
               </caption>
               <text>
                  <p>Analysis of the richness and community membership when peptide fragments identified in individual sequence reads were used to assemble the <it>Bacillus anthracis </it>str. Ames genome sequence. (A) The collector's curves for three non-parametric richness estimators and observed richness using individual sequence reads compared to the OPF richness of the assembled genome (horizontal black line). The solid lines represent the richness of non-merged OPFs and the dashed lines represent the richness of merged OPFs with a penalty of 0.15. (B) Collector's curves of parameters describing the similarity between two randomly selected subsets of peptide fragments.</p>
               </text>
               <graphic file="1471-2105-9-34-1"/>
            </fig>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Summary of errors and richness estimates when different criteria were used to merge OPFs. OPFs were merged when at least one peptide fragment in each OPF overlapped at least 5 aa and had a BSR value that was above the user specified level by the merge penalty. The type I error rate is the fraction of OPFs from the closed genome that correspond to multiple OPFs from the individual sequence reads. The type II error rate is the fraction of OPFs from the individual sequence reads that corresponded to more than one OPF from the closed genome sequence.</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c ca="center">
                        <p>
                           <b>Merge Penalty</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Type I Error Rate</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Type II Error Rate</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Observed Richness</b>
                        </p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>
                           <b>Richness Estimation (True Richness = 3,730)</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Chao1</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>ACE</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Jackknife</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Penalty = 0.00</p>
                     </c>
                     <c ca="center">
                        <p>0.063</p>
                     </c>
                     <c ca="center">
                        <p>0.129</p>
                     </c>
                     <c ca="center">
                        <p>2,927</p>
                     </c>
                     <c ca="center">
                        <p>3,038</p>
                     </c>
                     <c ca="center">
                        <p>2,976</p>
                     </c>
                     <c ca="center">
                        <p>3,137</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Penalty = 0.05</p>
                     </c>
                     <c ca="center">
                        <p>0.067</p>
                     </c>
                     <c ca="center">
                        <p>0.100</p>
                     </c>
                     <c ca="center">
                        <p>3,223</p>
                     </c>
                     <c ca="center">
                        <p>3,332</p>
                     </c>
                     <c ca="center">
                        <p>3,271</p>
                     </c>
                     <c ca="center">
                        <p>3,413</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Penalty = 0.10</p>
                     </c>
                     <c ca="center">
                        <p>0.071</p>
                     </c>
                     <c ca="center">
                        <p>0.074</p>
                     </c>
                     <c ca="center">
                        <p>3,462</p>
                     </c>
                     <c ca="center">
                        <p>3,574</p>
                     </c>
                     <c ca="center">
                        <p>3,510</p>
                     </c>
                     <c ca="center">
                        <p>3,653</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Penalty = 0.15</p>
                     </c>
                     <c ca="center">
                        <p>0.080</p>
                     </c>
                     <c ca="center">
                        <p>0.056</p>
                     </c>
                     <c ca="center">
                        <p>3,642</p>
                     </c>
                     <c ca="center">
                        <p>3,757</p>
                     </c>
                     <c ca="center">
                        <p>3,691</p>
                     </c>
                     <c ca="center">
                        <p>3,839</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Penalty = 0.20</p>
                     </c>
                     <c ca="center">
                        <p>0.091</p>
                     </c>
                     <c ca="center">
                        <p>0.046</p>
                     </c>
                     <c ca="center">
                        <p>3,810</p>
                     </c>
                     <c ca="center">
                        <p>3,925</p>
                     </c>
                     <c ca="center">
                        <p>3,858</p>
                     </c>
                     <c ca="center">
                        <p>4,004</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>No merge</p>
                     </c>
                     <c ca="center">
                        <p>0.719</p>
                     </c>
                     <c ca="center">
                        <p>0.003</p>
                     </c>
                     <c ca="center">
                        <p>11,538</p>
                     </c>
                     <c ca="center">
                        <p>11,668</p>
                     </c>
                     <c ca="center">
                        <p>11,616</p>
                     </c>
                     <c ca="center">
                        <p>11,968</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Comparing membership and structure using OPFs</p>
            </st>
            <p>Other tools have been developed to compare the membership (e.g. SONS) and structure (e.g. &#8747;-LIBSHUFF and AMOVA) of microbial communities using 16S rRNA gene sequences. Again, by analogy we were interested in using OPFs and BSRs to compare microbial communities using metagenomic sequences. SONS uses the output of DOTUR and MG-DOTUR to complete its analysis and required no further modification for use with metagenomic sequences. &#8747;-LIBSHUFF and AMOVA were modified to use the sparse matrix data representation used in MG-DOTUR. The resulting programs were designated MG-LIBSHUFF and MG-AMOVA. To test these programs, we randomly divided the 92,220 <it>B. anthracis </it>peptide fragments into two artificial communities that were each represented by 46,110 peptide fragments.</p>
            <p>We first applied SONS to these two communities to compare the membership and structure of the artificial communities using OPFs. We calculated the shared OPF richness using the Chao non-parametric estimator of shared richness and obtained a value of 3,561 OPFs. Although this estimate of shared richness is lower than 95% confidence interval observed for the total collection of peptide fragments using the Chao1, ACE, or Jackknife estimators, the shard Chao estimator was still increasing with additional sampling (Fig. <figr fid="F1">1B</figr>). This indicates that if sequencing had continued the estimate of shared richness would have probably overlapped eventually. The abundance-based Jaccard (J<sub>abund</sub>) estimate of similarity was 1.00, which predicted that all of the peptide fragments belonged to shared OPFs<sub>0.30</sub>. Yue and Clayton's measure of community overlap, <it>&#952;</it>, was 0.97, which indicated that the distribution of peptide fragments among OPFs was the same in both artificial communities. These results indicate that SONS is amenable to analyzing OPFs to detect similarity between the memberships and structures of different communities.</p>
            <p>An alternative approach to comparing community structures is to perform hypothesis tests. AMOVA uses an analysis of variance (ANOVA)-type framework to test the hypothesis that the difference in genetic diversity between two or more communities is not significantly different than the diversity within each community. We implemented this analysis in a program designated MG-AMOVA to perform a single-classification analysis. Our comparison of two randomly generated <it>B. anthracis </it>peptide fragment pseudo-communities revealed that the observed differences between the two pseudo-communities were not statistically significant (p > 0.05). Next we modified the program &#8747;-LIBSHUFF to create MG-LIBSHUFF to test the hypothesis that two communities have the same structures. As expected, the differences in structure between the two pseudo-communities were not statistically significant (P > 0.05). Each of these comparisons indicate that we can make statistical comparisons between the membership and structure of microbial communities using peptide fragments identified in single sequence reads from metagenomic data.</p>
         </sec>
         <sec>
            <st>
               <p>Acid Mine Drainage</p>
            </st>
            <p>Tyson et al. <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> used metagenomic sequencing to analyze a biofilm growing in acid mine drainage (AMD) that had a pH below 1.0. They obtained 322 archaeal and bacterial 16S rRNA gene sequences and 103,462 random paired sequence reads, which represented 76.2 Gbp of DNA. We used DOTUR to assign 16S rRNA gene sequences to nine OTUs and predicted there were an additional three OTUs (95% confidence interval [95% CI] = 0 to 8) that were not observed (Fig. <figr fid="F2">2A</figr>). The most abundant OTU was similar to <it>Leptospirillum ferriphilum </it>(n = 247) 16S rRNA gene sequences.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Collector's curves for the OTU (A) and OPF (B) richness observed and estimated using DNA extracted from an AMD biofilm community</p>
               </caption>
               <text>
                  <p>Collector's curves for the OTU (A) and OPF (B) richness observed and estimated using DNA extracted from an AMD biofilm community.</p>
               </text>
               <graphic file="1471-2105-9-34-2"/>
            </fig>
            <p>Next, we used MG-DOTUR to assign 99,419 peptide fragments to 10,235 merged OPFs. The dominant merged OPF (n = 901 fragments) did not have a homolog in GenBank and the next most abundant merged OPFs<sub>' </sub>were most similar to a conserved hypothetical protein from <it>Leptospirillum </it>sp. Group II UBA (n = 773, <ext-link ext-link-type="gen" ext-link-id="EAY56482">EAY56482</ext-link>) and a transposase (n = 461, <ext-link ext-link-type="gen" ext-link-id="ZP_00669012">ZP_00669012</ext-link>). The dominant non-merged OPF did not have a homolog in GenBank (n = 114 fragments) and the next most abundant OPFs were most similar to an HNH nuclease (n = 96, <ext-link ext-link-type="gen" ext-link-id="ZP_01023224">ZP_01023224</ext-link>) and a mutator-type transposase (n = 88, <ext-link ext-link-type="gen" ext-link-id="ZP_00669012">ZP_00669012</ext-link>). The Chao1 richness estimator predicted that there were a minimum of 18,463 merged OPFs<sub>0.30 </sub>in the community (95% confidence interval [95% CI] = 17,794 to 19,191; Fig. <figr fid="F2">2B</figr>). Considering the lack of a known function for two of the most abundant OPFs in the AMD community, this analysis shows the importance of including such sequences in metagenomic sequence analyses and may indicate that subsequent analysis of this group of sequences would reveal important physiological information about the community.</p>
         </sec>
         <sec>
            <st>
               <p>Soil</p>
            </st>
            <p>Tringe et al. <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> used Minnesotan farm soil to build libraries and sequence 1,633 bacterial 16S rRNA gene fragments and 149,085 random DNA fragments, representing 76 Gbp of DNA. We previously showed that the OTU richness was approximately 2,000 <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. The three most abundant OTUs were representatives of the Chloroflexi.</p>
            <p>Using MG-DOTUR to analyze the random metagenomic sequence reads, the 143,422 peptide fragments clustered into 98,066 merged OPFs. The members of the dominant merged OPF had similarity to a putative two-component response regulator (n = 688; <ext-link ext-link-type="gen" ext-link-id="NP_254170">NP_254170</ext-link>). The next most abundant merged OPFs had similarity to a histidine kinase (n = 566; <ext-link ext-link-type="gen" ext-link-id="YP_386369">YP_386369</ext-link>) and a serine/threonine protein kinase (n = 371; <ext-link ext-link-type="gen" ext-link-id="YP_825781">YP_825781</ext-link>). The three most abundant non-merged OPFs in the soil community had homology to a putative response regulator (n = 29, <ext-link ext-link-type="gen" ext-link-id="NP_520928">NP_520928</ext-link>), a PadR-like transcriptional regulator (n = 21, <ext-link ext-link-type="gen" ext-link-id="ZP_00524755">ZP_00524755</ext-link>), and a Cu<sup>2+</sup>-transporting ATPase (n = 20, <ext-link ext-link-type="gen" ext-link-id="ZP_01060472">ZP_01060472</ext-link>). Because of the considerable diversity in the soil sample, an insufficient number of peptide fragments were sampled to obtain a reliable OPF richness estimate; however, using the Chao1 richness estimator we predicted that the OPF richness was at least 361,546 (95% CI = 355,613 to 367,615; Fig. <figr fid="F3">3B</figr>). Although considerable additional sequencing effort is required to obtain a reliable estimate of OPF richness, it is interesting that in spite of the relatively large OTU and OPFs richness, it was possible to assign a large number of peptide fragments to the same OPF.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Collector's curves for the OTU (A) and OPF (B) richness observed and estimated using DNA extracted from an agricultural soil in Minnesota, USA</p>
               </caption>
               <text>
                  <p>Collector's curves for the OTU (A) and OPF (B) richness observed and estimated using DNA extracted from an agricultural soil in Minnesota, USA.</p>
               </text>
               <graphic file="1471-2105-9-34-3"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Whalebone communities</p>
            </st>
            <p>Tringe et al. <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> compared three bacterial communities growing on the bones of two whales (AHAA and AHAI were from the same whale) at the bottom of the Pacific Ocean using 16S rRNA and metagenomic sequence analysis. Based on 16S rRNA sequence data, the three communities designated AGZO (n = 73), AHAA (n = 65), and AHAI (n = 68) had a Chao1-estimated OTU richness of at least 140 (95% CI = 67 to 366), 48 (95% CI = 29 to 121), and 19 (95% CI = 17 to 34). The most abundant OTU<sub>0.03 </sub>from each of the three communities affiliated with members of the <it>Arcobacter </it>sp. (n = 15), Bacteroidetes (n = 12), and Flavobacteriales (n = 19), respectively. We estimated that each of the three communities shared between 1 and 3 OTUs<sub>0.03 </sub>with any of the other communities. The lack of conservation of membership between the three communities resulted in low J<sub>abund </sub>coefficients (0.01 to 0.19), <it>&#952; </it>values (0.04 to 0.11), and statistically significant P values when comparing the communities using AMOVA and &#8747;-LIBSHUFF (all p &lt; 0.001). Although the three communities each came from similar environments, the taxonomic membership and structure of the three communities were considerably different.</p>
            <p>We applied the newly developed statistical tools to the metagenomic sequences of the three communities to assess their genetic and functional similarities. The three communities, AGZO, AHAA, and AHAI, yielded approximately 38,000 (25 Mbp), 38,000 (25 Mbp), and 40,000 (25 Mbp) sequence reads and 38,981, 36,165, and 33,199 peptide fragments, which were over 100 aa long, respectively. The dominant merged OPFs in each community were similar to a histidine kinase (AGZO, n = 386; <ext-link ext-link-type="gen" ext-link-id="YP_341128">YP_341128</ext-link>) and an ABC transporter (AHAA, n = 175 and AHAI, n = 166; <ext-link ext-link-type="gen" ext-link-id="ZP_01203057">ZP_01203057</ext-link>). The most abundant non-merged OPF found in each community was homologous to a conserved hypothetical protein (AGZO: n = 22, <ext-link ext-link-type="gen" ext-link-id="NP_442017">NP_442017</ext-link>), RecR (AHAA: n = 9, <ext-link ext-link-type="gen" ext-link-id="ZP_00952890">ZP_00952890</ext-link>), another conserved hypothetical protein (AHAI: n = 16, <ext-link ext-link-type="gen" ext-link-id="ZP_00949155">ZP_00949155</ext-link>), and a putative transposase (AHAI: n = 16, <ext-link ext-link-type="gen" ext-link-id="ZP_00903285">ZP_00903285</ext-link>). The Chao1 OPF richness estimates for each of the communities continued to increase with additional sampling, indicating that the communities had a minimum OPF richness of 69,541 (95% CI = 67,618 to 71,550), 77,923 (95% CI = 75,699 to 80,276), and 49,120 (95% CI = 47,767 to 50,539) for the AGZO, AHAA, and AHAI communities, respectively.</p>
            <p>Although there was an insufficient number of peptide fragments to obtain a reliable estimate of the fraction of OPF membership that was shared between any two of the three communities, we estimated that they shared at least between 10 and 20% of their OPF membership (Fig. <figr fid="F4">4</figr>). The "core" whalebone OPF membership that was shared among the three whalebone communities had a richness of at least 3,800 OPFs (approximately 2.5% of the total richness); 1,678 of these were actually observed in the sequence collection. The most commonly shared OPFs among the three communities represented a variety of activities including metal transport, sensors, and housekeeping functions (Table <tblr tid="T3">3</tblr>).</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Venn diagram comparing the OPF membership found in three whalebone microbial communities (AGZO, n = 38,981 peptide fragments; AHAA, n = 36,165; and AHAI, n = 33,199)</p>
               </caption>
               <text>
                  <p>Venn diagram comparing the OPF membership found in three whalebone microbial communities (AGZO, n = 38,981 peptide fragments; AHAA, n = 36,165; and AHAI, n = 33,199). Below each community name is the Chao1 richness estimate and the 95% confidence interval for that community. We estimated the richness of the overlapping regions based on the pairwise S<sub>A,B Chao </sub>shared richness estimates between the three communities and by pooling two communities and estimating the shared fraction with the third community. These estimates are provided on the right side of the figure.</p>
               </text>
               <graphic file="1471-2105-9-34-4"/>
            </fig>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Summary of most abundant merged and non-merged OPFs from the three whalebone communities.</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c cspan="3" ca="center">
                        <p>
                           <b>Number of ORFs in OPF</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Putative annotation</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Representative GenBank Accession</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>AGZO</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>AHAA</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>AHAI</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c cspan="5" ca="left">
                        <p>
                           <b>
                              <it>Merged OPFs</it>
                           </b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>386</p>
                     </c>
                     <c ca="center">
                        <p>32</p>
                     </c>
                     <c ca="center">
                        <p>26</p>
                     </c>
                     <c ca="left">
                        <p>Histidine kinase</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="YP_341128">YP_341128</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>229</p>
                     </c>
                     <c ca="center">
                        <p>175</p>
                     </c>
                     <c ca="center">
                        <p>166</p>
                     </c>
                     <c ca="left">
                        <p>ABC transporter</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="ZP_01203057">ZP_01203057</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>137</p>
                     </c>
                     <c ca="center">
                        <p>21</p>
                     </c>
                     <c ca="center">
                        <p>22</p>
                     </c>
                     <c ca="left">
                        <p>Aerotaxis sensor receptor</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="YP_339458">YP_339458</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>91</p>
                     </c>
                     <c ca="center">
                        <p>27</p>
                     </c>
                     <c ca="center">
                        <p>33</p>
                     </c>
                     <c ca="left">
                        <p>Sensory box protein</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="YP_341105">YP_341105</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>75</p>
                     </c>
                     <c ca="center">
                        <p>39</p>
                     </c>
                     <c ca="center">
                        <p>65</p>
                     </c>
                     <c ca="left">
                        <p>ATP-dependent RNA helicase protein</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="NP_518660">NP_518660</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>74</p>
                     </c>
                     <c ca="center">
                        <p>62</p>
                     </c>
                     <c ca="center">
                        <p>51</p>
                     </c>
                     <c ca="left">
                        <p>Translation elongation factor</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="ZP_01061839">ZP_01061839</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>56</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                     <c ca="center">
                        <p>104</p>
                     </c>
                     <c ca="left">
                        <p>Acyl-CoA dehydrogenase</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="ZP_01106089">ZP_01106089</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>52</p>
                     </c>
                     <c ca="center">
                        <p>68</p>
                     </c>
                     <c ca="center">
                        <p>66</p>
                     </c>
                     <c ca="left">
                        <p>Aldehyde dehydrogenase</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="YP_341708">YP_341708</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>49</p>
                     </c>
                     <c ca="center">
                        <p>65</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                     <c ca="left">
                        <p>Copper transport membrane protein</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="ZP_01060525">ZP_01060525</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>49</p>
                     </c>
                     <c ca="center">
                        <p>53</p>
                     </c>
                     <c ca="center">
                        <p>72</p>
                     </c>
                     <c ca="left">
                        <p>Acetyl-CoA acetyltransferase</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="ZP_01165108">ZP_01165108</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>44</p>
                     </c>
                     <c ca="center">
                        <p>45</p>
                     </c>
                     <c ca="center">
                        <p>66</p>
                     </c>
                     <c ca="left">
                        <p>Cation efflux protein</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="YP_678123">YP_678123</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c cspan="5" ca="left">
                        <p>
                           <b>
                              <it>Non-merged OPFs</it>
                           </b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>Thioredoxin</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="ZP_01901399">ZP_01901399</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>Conserved hypothetical protein</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="ZP_01054178">ZP_01054178</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>GTP-binding protein LepA</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="YP_745328">YP_745328</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                     <c ca="left">
                        <p>DNA topoisomerase IV, subunit A</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="YP_756797">YP_756797</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>50S ribosomal protein, L20</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="ZP_01108363">ZP_01108363</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>50S ribosomal protein, L14</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="ZP_00952078">ZP_00952078</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="left">
                        <p>30S ribosomal protein, S11</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="ZP_01302802">ZP_01302802</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>Recombination protein, RecR</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="ZP_00948629">ZP_00948629</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="left">
                        <p>DNA helicase, RuvB</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="YP_357747">YP_357747</ext-link>
                        </p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>Comparison of the community structures using the peptide fragments using MG-LIBSHUFF (all p &lt; 0.001) and MG-AMOVA (all p &lt; 0.001) found that the structures of these three communities were significantly different. Using OPFs, <it>&#952; </it>varied between 0.39 and 0.55 indicating that there was some similarity in community structure. The ability to quantify and assess the differences in communities without exhaustive sampling of the three whalebone communities indicates the importance of applying statistical methods to metagenomic sequence data. Such analyses make comparative metagenomics amenable to ecologically-based hypothesis testing.</p>
         </sec>
         <sec>
            <st>
               <p>Comparison of the three environments</p>
            </st>
            <p>To assess the relative similarity of OTU<sub>0.03 </sub>membership between environments, we used DOTUR to cluster the 2,161 16S rRNA gene fragments from the AMD (n = 322), soil (n = 1,633), and whalebone communities (n = 206). No OTUs were shared between any two of the three communities; however, additional sampling may have identified OTUs that were shared between environments.</p>
            <p>We compared the relative similarity of OPF membership between environments by clustering the 351,186-peptide fragments from the AMD (n = 99,419), soil (n = 143,422), and whalebone communities (n = 108,345) using MG-DOTUR and then we estimated the membership and structure overlap among the three communities (Fig. <figr fid="F5">5</figr>). Measuring the overlap of OPFs measurement among the three communities resulted in the estimate that more than 800 OPFs were shared among the five communities; this represents less than 0.3% of the total OPF richness found in the five communities. Of this pool, 774 merged OPFs and were actually observed with functions including metal transport, housekeeping, and various dehydrogenase activities (Table <tblr tid="T4">4</tblr>). Applications of the statistical tools to these types of comparisons will enable researchers to investigate the problem of biogeography using genome-based methods.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Venn diagram comparing the pooled OPF membership found in the AMD (n = 99,419 peptide fragments), soil (n = 143,422), and whalebone (n = 108,345) microbial communities</p>
               </caption>
               <text>
                  <p>Venn diagram comparing the pooled OPF membership found in the AMD (n = 99,419 peptide fragments), soil (n = 143,422), and whalebone (n = 108,345) microbial communities.</p>
               </text>
               <graphic file="1471-2105-9-34-5"/>
            </fig>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Summary of most abundant merged and non-merged OPFs from the AMD, soil, and whalebone communities.</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c cspan="3" ca="center">
                        <p>
                           <b>Number of ORFs in OPF</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Putative annotation</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Representative GenBank Accession</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>AMD</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Soil</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Whale</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c cspan="5" ca="left">
                        <p>
                           <b>
                              <it>Merged OPFs</it>
                           </b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>562</p>
                     </c>
                     <c ca="center">
                        <p>350</p>
                     </c>
                     <c ca="center">
                        <p>451</p>
                     </c>
                     <c ca="left">
                        <p>Acetate CoA ligase</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="ZP_01856978">ZP_01856978</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>1628</p>
                     </c>
                     <c ca="center">
                        <p>1515</p>
                     </c>
                     <c ca="center">
                        <p>1240</p>
                     </c>
                     <c ca="left">
                        <p>Diguanylate cyclase signal protein</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="YP_001112705">YP_001112705</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>796</p>
                     </c>
                     <c ca="center">
                        <p>1226</p>
                     </c>
                     <c ca="center">
                        <p>1086</p>
                     </c>
                     <c ca="left">
                        <p>ABC transporter</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="ZP_01060315">ZP_01060315</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>371</p>
                     </c>
                     <c ca="center">
                        <p>163</p>
                     </c>
                     <c ca="center">
                        <p>138</p>
                     </c>
                     <c ca="left">
                        <p>Resistance protein</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="ZP_01908921">ZP_01908921</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>216</p>
                     </c>
                     <c ca="center">
                        <p>121</p>
                     </c>
                     <c ca="center">
                        <p>152</p>
                     </c>
                     <c ca="left">
                        <p>Dehydrogenase</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="ZP_01454599">ZP_01454599</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>238</p>
                     </c>
                     <c ca="center">
                        <p>237</p>
                     </c>
                     <c ca="center">
                        <p>236</p>
                     </c>
                     <c ca="left">
                        <p>Cation transporting ATPase</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="ZP_01060472">ZP_01060472</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>237</p>
                     </c>
                     <c ca="center">
                        <p>170</p>
                     </c>
                     <c ca="center">
                        <p>269</p>
                     </c>
                     <c ca="left">
                        <p>Dehydrogenase</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="ZP_01105894">ZP_01105894</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>476</p>
                     </c>
                     <c ca="center">
                        <p>282</p>
                     </c>
                     <c ca="center">
                        <p>318</p>
                     </c>
                     <c ca="left">
                        <p>Translocation elongation factor</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="ZP_01594411">ZP_01594411</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>123</p>
                     </c>
                     <c ca="center">
                        <p>125</p>
                     </c>
                     <c ca="center">
                        <p>156</p>
                     </c>
                     <c ca="left">
                        <p>DNA helicase</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="ZP_01189997">ZP_01189997</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>169</p>
                     </c>
                     <c ca="center">
                        <p>184</p>
                     </c>
                     <c ca="center">
                        <p>289</p>
                     </c>
                     <c ca="left">
                        <p>Acyl CoA dehyodrogenase</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="ZP_01512967">ZP_01512967</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c cspan="5" ca="left">
                        <p>
                           <b>
                              <it>Non-merged OPFs</it>
                           </b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>22</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>Urocanate hydratase</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="ZP_01709366">ZP_01709366</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>18</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>13</p>
                     </c>
                     <c ca="left">
                        <p>DNA gyrase, A subunit</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="ZP_01052578">ZP_01052578</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="left">
                        <p>Nucleoside-diphosphate kinase</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="ZP_01106957">ZP_01106957</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>11</p>
                     </c>
                     <c ca="center">
                        <p>13</p>
                     </c>
                     <c ca="center">
                        <p>14</p>
                     </c>
                     <c ca="left">
                        <p>Nitrogen regulatory protein PII</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="NP_767252">NP_767252</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="left">
                        <p>50S ribosomal protein, L19</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="YP_471434">YP_471434</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>9</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                     <c ca="left">
                        <p>GTP-binding protein LepA</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="ZP_01650030">ZP_01650030</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>11</p>
                     </c>
                     <c ca="center">
                        <p>14</p>
                     </c>
                     <c ca="left">
                        <p>50S ribosomal protein, L20</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="ZP_01108363">ZP_01108363</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                     <c ca="left">
                        <p>Excinuclease ATPase subunit</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="YP_861824">YP_861824</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>11</p>
                     </c>
                     <c ca="left">
                        <p>GMP synthase</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="ZP_01753395">ZP_01753395</ext-link>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>13</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="left">
                        <p>30S ribosomal protein, S13</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="ZP_01885769">ZP_01885769</ext-link>
                        </p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>For comparison, we compared the complement of ORFs from the fully sequenced <it>Bacillus anthracis </it>str. 'Ames Ancestor' (GenBank accession <ext-link ext-link-type="gen" ext-link-id="AE017334">AE017334</ext-link>), <it>Bacillus cereus </it>ATCC 10987 (<ext-link ext-link-type="gen" ext-link-id="AE017194">AE017194</ext-link>), <it>Escherichia coli </it>K12 (<ext-link ext-link-type="gen" ext-link-id="U00096">U00096</ext-link>)<it>, Methanosarcina acetivorans </it>C2A (<ext-link ext-link-type="gen" ext-link-id="AE010299">AE010299</ext-link>), <it>Methanosarcina barkeri </it>str. fusaro (<ext-link ext-link-type="gen" ext-link-id="CP000099">CP000099</ext-link>) genomes. We used MG-DOTUR to assign ORFs to OPFs and then we used SONS to compare the OPF<sub>0.30 </sub>overlap between these genomes, which we selected for their phylogenetic similarity and breadth. As predicted based on current understanding of phylogenetics, the more closely related organisms had the greatest OPF<sub>0.30 </sub>overlap. The comparison between <it>B. anthracis </it>and <it>B. cereus </it>yielded J<sub>clas </sub>and <it>&#952; </it>values of 0.70 and 0.74, <it>E. coli </it>and <it>Y. pestis </it>yielded values of 0.43 and 0.20, and <it>M. acetivorans </it>and <it>M. barkeri </it>yielded values of 0.54 and 0.43. All of the other pairwise comparisons yielded values below 0.08 for both parameters. This analysis suggests that the comparisons between the OPFs<sub>0.30 </sub>identified in the metagenomic sequences represent the level of differences expected between phylogenetically disparate groups of bacteria. Furthermore, analyses using completed genome sequences may enable investigators to define the size and boundaries of so-called "pan-genomes."</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>We present a statistical toolbox to estimate the functional richness and overlap among communities based on peptide fragments deduced from DNA sequence data. These statistical approaches are necessary, in part, because the immense genomic diversity contained in most communities precludes the formation of contigs. There is also considerable question regarding the robustness of sequence assembly <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>. Although understanding these complex communities is tantalizing, it may prove useful to identify more communities similar to the AMD and whalebone communities that have a relatively low diversity to develop and test tools that can then be applied to soil. As sequencing technologies improve, the feasibility of obtaining nearly complete sequence coverage of the more diverse communities will improve. The rapid advances in sequencing short DNA fragments (approximately 100 bp long) in a highly parallelized manner <abbrgrp><abbr bid="B41">41</abbr></abbrgrp> presents many new opportunities, but the method may not be amenable to metagenomic sequencing because the short sequence reads produce peptide fragments less than 100 aa long, which could make a meaningful ORF identification and analysis of functional diversity difficult.</p>
         <p>Innovative methods have been developed to compare collections of 16S rRNA sequences, and analogous new methods are needed for comparing metagenomic sequences. For example, improving our ability to estimate and interpret the biological meaning of OPF richness will be helpful for describing the relative functional capacity of a community. Our analysis does not address the possibility that distant OPFs might serve the same biological function and that members of the same OPF might have different functions. Therefore, further work is needed to unify studies of functionally active clones into a statistical framework. For example, comparing the collection of genes conferring antibiotic resistance found in multiple environments would enable us to understand better the diversity of these genes as well as their biogeography.</p>
         <p>Our analysis moves beyond previous attempts to compare microbial communities at the genomic level by not being dependent upon reference databases and introducing statistical rigor to the description and comparison of microbial communities. For example, previous analyses formed clusters based on similarity to reference databases and excluded those peptide fragments with no significant matches, which limited the scope of the analysis. Here, we formed OPFs using the observed data, in essence allowing the data to "speak for themselves", which allowed for a comprehensive comparison of the data. Previous analyses also based the level of similarity between communities on the observed peptide fragments as though they represented a statistical population. Here, we treated the data as a statistical sample and employed statistical tools to estimate the level of similarity between community membership and structure. These tools enable a quantitative, comprehensive, and statistically robust analysis of microbial communities at the genomic level.</p>
         <p>Shotgun sequencing of metagenomic communities is becoming increasingly popular and routine. The results of these efforts will provide more insight if they are wrapped in robust ecological and statistical frameworks. Tools are needed to advance data analysis beyond the frequency of different COGs or KEGG categories that are found within a community. This study is a step in building such a framework to compare microbial communities functionally at the genomic level. In addition to estimating community relatedness based on metagenomic data, our approach accounts for present but unsampled peptide fragments, is independent of a subjective annotation process, and includes peptide fragments with no known function.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Genome sequence data</p>
            </st>
            <p>We obtained the 101,379 sequence reads used to assemble the <it>Bacillus anthracis </it>str. Ames whole genome sequence from GenBank (<ext-link ext-link-type="gen" ext-link-id="NC_003997">NC_003997</ext-link>). Each sequence read was evaluated by fastgenesb at the Joint Genome Institute using the same parameters used to predict the identity of peptide fragments in two previous metagenomic sequencing studies <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B18">18</abbr></abbrgrp>. We also obtained the complete complement of 4,514 ORFs from the finished genome that were longer than 100 aa. All of the predicted peptide fragments from the published metagenomic sequencing projects using an acid mine drainage biofilm <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, whalebone <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, and soil <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> were obtained from the Joint Genome Institute. Only those ORFs and peptide fragments longer than 100 aa were considered in our analyses.</p>
         </sec>
         <sec>
            <st>
               <p>Modified toolbox</p>
            </st>
            <p>DOTUR is a freely available computer program that uses a distance matrix to assign sequences to operational taxonomic units (OTUs) using either the nearest, average, or furthest neighbor clustering algorithms for all possible distances and then constructs rarefaction and collector's curves for a variety of ecological parameters <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. These curves can be used to compare the relative richness, the number of different OTUs in a community, of two samples and to estimate the overall richness within a sample. Similarly, MG-DOTUR clusters sequences into OPFs using a BLAST table as the input. ORFs are assigned to OPFs using the furthest neighbor clustering algorithm <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>, which requires that all sequences in the OPF have a pairwise BSR value greater than a specified value. Because BSRs are not necessarily symmetric (i.e. BSR<sub>ij </sub>&#8800; BSR<sub>ji</sub>), they were forced to be symmetric by using the smaller of the two values. Once MG-DOTUR assigns sequences to OPFs, rarefaction curves of the number of OPFs observed on average as a function of ORFs sampled and collector's curves of the Chao1 <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>, ACE <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>, and the interpolated Jackknife <abbrgrp><abbr bid="B45">45</abbr></abbrgrp> richness estimates as a function of ORFs sample are calculated at multiple BSR values predefined by the user. MG-DOTUR uses a switch to calculate the ACE estimator. If the coefficient of variation (<it>&#947;</it>) is greater than 0.8, then the ACE-1 estimator is calculated, otherwise the simple ACE estimator is used. This follows recommendations made by Anne Chao for use of the program SPADE <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. This study reports results obtained by defining an OPF a group of sequences with a BSR value greater than 0.30 <abbrgrp><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr></abbrgrp> and OTU as a group of sequences that are all more than 97% identical to each other <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>.</p>
            <p>&#8747;-LIBSHUFF <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> is a modified version of the program LIBSHUFF <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> that makes use of the integral form of the Cram&#233;r-von Mises statistic to determine whether two communities are either samples of the same statistical population, sub-samples of each other, or were drawn from different statistical populations. As employed in &#8747;-LIBSHUFF, the Cram&#233;r-von Mises statistic is a function of the coverage of one sequence collection onto itself (i.e. homologous coverage, C<sub>X</sub>) compared to its coverage onto another collection (i.e. heterologous coverage, C<sub>XY</sub>). Coverage is the fraction of sequences that have another sequence within a given distance of them. Application of the LIBSHUFF-style analysis requires converting BSR values into distances by subtracting the BSR value from one and setting the limits of integration from zero to 0.70. MG-LIBSHUFF calculates the &#916;C<sub>XY </sub>statistic and evaluates its significance using a Monte Carlo testing procedure as described elsewhere <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp>.</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-9-34-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>&#916;</m:mi>
                           <m:msub>
                              <m:mi>C</m:mi>
                              <m:mrow>
                                 <m:mi>X</m:mi>
                                 <m:mi>Y</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:mrow>
                                 <m:munderover>
                                    <m:mo>&#8747;</m:mo>
                                    <m:mn>0</m:mn>
                                    <m:mrow>
                                       <m:mn>1</m:mn>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mi>B</m:mi>
                                       <m:mi>S</m:mi>
                                       <m:msub>
                                          <m:mi>R</m:mi>
                                          <m:mrow>
                                             <m:mi>min</m:mi>
                                             <m:mo>&#8289;</m:mo>
                                          </m:mrow>
                                       </m:msub>
                                    </m:mrow>
                                 </m:munderover>
                                 <m:mrow>
                                    <m:msup>
                                       <m:mrow>
                                          <m:mrow>
                                             <m:mo>(</m:mo>
                                             <m:mrow>
                                                <m:msub>
                                                   <m:mi>C</m:mi>
                                                   <m:mi>X</m:mi>
                                                </m:msub>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>D</m:mi>
                                                <m:mo stretchy="false">)</m:mo>
                                                <m:mo>&#8722;</m:mo>
                                                <m:msub>
                                                   <m:mi>C</m:mi>
                                                   <m:mrow>
                                                      <m:mi>X</m:mi>
                                                      <m:mi>Y</m:mi>
                                                   </m:mrow>
                                                </m:msub>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>D</m:mi>
                                                <m:mo stretchy="false">)</m:mo>
                                             </m:mrow>
                                             <m:mo>)</m:mo>
                                          </m:mrow>
                                       </m:mrow>
                                       <m:mn>2</m:mn>
                                    </m:msup>
                                 </m:mrow>
                              </m:mrow>
                           </m:mstyle>
                           <m:mi>d</m:mi>
                           <m:mi>D</m:mi>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaeuiLdqKaem4qam0aaSbaaSqaaiabdIfayjabdMfazbqabaGccqGH9aqpdaWdXbqaamaabmaabaGaem4qam0aaSbaaSqaaiabdIfaybqabaGccqGGOaakcqWGebarcqGGPaqkcqGHsislcqWGdbWqdaWgaaWcbaGaemiwaGLaemywaKfabeaakiabcIcaOiabdseaejabcMcaPaGaayjkaiaawMcaamaaCaaaleqabaGaeGOmaidaaaqaaiabicdaWaqaaiabigdaXiabgkHiTiabdkeacjabdofatjabdkfasnaaBaaameaacyGGTbqBcqGGPbqAcqGGUbGBaeqaaaqdcqGHRiI8aOGaemizaqMaemiraqeaaa@50E5@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where,</p>
            <p>D = the distance (1-BSR) that is used to determine the level of coverage.</p>
            <p>C<sub>X</sub>(D) and C<sub>XY</sub>(D) = measures of homologous and heterologous library coverage.</p>
            <p>BSR<sub>min </sub>= the smallest meaningful BSR value; for this analysis set at 0.30</p>
            <p>Population biologists have developed an analysis of variance (ANOVA)-style of analysis, which tests whether a collection of communities have similar genetic diversities using mitochondrial DNA sequences and other genetic markers. This method has been designated as either the analysis of molecular variance (AMOVA) <abbrgrp><abbr bid="B47">47</abbr></abbrgrp> or non-parametric multivariate analysis of variance (MANOVA) <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>. This analysis has been applied for comparing bacterial communities using 16S rRNA sequences <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. The general method is based on partitioning the sum of the squared elements in a distance matrix similar to what is done in an ANOVA. As applied in MG-AMOVA, we implement a single-classification ANOVA design to determine whether the average genetic average genetic difference between the three whalebone communities was significantly greater than the difference within a community. The total sum-squared error (SS<sub>T</sub>) and within community sum-squared error (SS<sub>W</sub>) is calculated by</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-9-34-i2" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mtable columnalign="left">
                              <m:mtr columnalign="left">
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mi>S</m:mi>
                                       <m:msub>
                                          <m:mi>S</m:mi>
                                          <m:mi>T</m:mi>
                                       </m:msub>
                                       <m:mo>=</m:mo>
                                       <m:mfrac>
                                          <m:mn>1</m:mn>
                                          <m:mi>N</m:mi>
                                       </m:mfrac>
                                       <m:mstyle displaystyle="true">
                                          <m:munderover>
                                             <m:mo>&#8721;</m:mo>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mo>=</m:mo>
                                                <m:mn>1</m:mn>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mi>N</m:mi>
                                                <m:mo>&#8722;</m:mo>
                                                <m:mn>1</m:mn>
                                             </m:mrow>
                                          </m:munderover>
                                          <m:mrow>
                                             <m:mstyle displaystyle="true">
                                                <m:munderover>
                                                   <m:mo>&#8721;</m:mo>
                                                   <m:mrow>
                                                      <m:mi>j</m:mi>
                                                      <m:mo>=</m:mo>
                                                      <m:mi>i</m:mi>
                                                      <m:mo>+</m:mo>
                                                      <m:mn>1</m:mn>
                                                   </m:mrow>
                                                   <m:mi>N</m:mi>
                                                </m:munderover>
                                                <m:mrow>
                                                   <m:msup>
                                                      <m:mrow>
                                                         <m:mrow>
                                                            <m:mo>(</m:mo>
                                                            <m:mrow>
                                                               <m:mn>1</m:mn>
                                                               <m:mo>&#8722;</m:mo>
                                                               <m:mi>B</m:mi>
                                                               <m:mi>S</m:mi>
                                                               <m:msub>
                                                                  <m:mi>R</m:mi>
                                                                  <m:mrow>
                                                                     <m:mi>i</m:mi>
                                                                     <m:mi>j</m:mi>
                                                                  </m:mrow>
                                                               </m:msub>
                                                            </m:mrow>
                                                            <m:mo>)</m:mo>
                                                         </m:mrow>
                                                      </m:mrow>
                                                      <m:mn>2</m:mn>
                                                   </m:msup>
                                                </m:mrow>
                                             </m:mstyle>
                                          </m:mrow>
                                       </m:mstyle>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                              <m:mtr columnalign="left">
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mi>S</m:mi>
                                       <m:msub>
                                          <m:mi>S</m:mi>
                                          <m:mi>W</m:mi>
                                       </m:msub>
                                       <m:mo>=</m:mo>
                                       <m:mfrac>
                                          <m:mn>1</m:mn>
                                          <m:mi>N</m:mi>
                                       </m:mfrac>
                                       <m:mstyle displaystyle="true">
                                          <m:munderover>
                                             <m:mo>&#8721;</m:mo>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mo>=</m:mo>
                                                <m:mn>1</m:mn>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mi>N</m:mi>
                                                <m:mo>&#8722;</m:mo>
                                                <m:mn>1</m:mn>
                                             </m:mrow>
                                          </m:munderover>
                                          <m:mrow>
                                             <m:mstyle displaystyle="true">
                                                <m:munderover>
                                                   <m:mo>&#8721;</m:mo>
                                                   <m:mrow>
                                                      <m:mi>j</m:mi>
                                                      <m:mo>=</m:mo>
                                                      <m:mi>i</m:mi>
                                                      <m:mo>+</m:mo>
                                                      <m:mn>1</m:mn>
                                                   </m:mrow>
                                                   <m:mi>N</m:mi>
                                                </m:munderover>
                                                <m:mrow>
                                                   <m:msup>
                                                      <m:mrow>
                                                         <m:mrow>
                                                            <m:mo>(</m:mo>
                                                            <m:mrow>
                                                               <m:mn>1</m:mn>
                                                               <m:mo>&#8722;</m:mo>
                                                               <m:mi>B</m:mi>
                                                               <m:mi>S</m:mi>
                                                               <m:msub>
                                                                  <m:mi>R</m:mi>
                                                                  <m:mrow>
                                                                     <m:mi>i</m:mi>
                                                                     <m:mi>j</m:mi>
                                                                  </m:mrow>
                                                               </m:msub>
                                                            </m:mrow>
                                                            <m:mo>)</m:mo>
                                                         </m:mrow>
                                                      </m:mrow>
                                                      <m:mn>2</m:mn>
                                                   </m:msup>
                                                   <m:msub>
                                                      <m:mi>&#949;</m:mi>
                                                      <m:mrow>
                                                         <m:mi>i</m:mi>
                                                         <m:mi>j</m:mi>
                                                      </m:mrow>
                                                   </m:msub>
                                                </m:mrow>
                                             </m:mstyle>
                                          </m:mrow>
                                       </m:mstyle>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                           </m:mtable>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaqbaeaabiqaaaqaaiabdofatjabdofatnaaBaaaleaacqWGubavaeqaaOGaeyypa0tcfa4aaSaaaeaacqaIXaqmaeaacqWGobGtaaGcdaaeWbqaamaaqahabaWaaeWaaeaacqaIXaqmcqGHsislcqWGcbGqcqWGtbWucqWGsbGudaWgaaWcbaGaemyAaKMaemOAaOgabeaaaOGaayjkaiaawMcaamaaCaaaleqabaGaeGOmaidaaaqaaiabdQgaQjabg2da9iabdMgaPjabgUcaRiabigdaXaqaaiabd6eaobqdcqGHris5aaWcbaGaemyAaKMaeyypa0JaeGymaedabaGaemOta4KaeyOeI0IaeGymaedaniabggHiLdaakeaacqWGtbWucqWGtbWudaWgaaWcbaGaem4vaCfabeaakiabg2da9KqbaoaalaaabaGaeGymaedabaGaemOta4eaaOWaaabCaeaadaaeWbqaamaabmaabaGaeGymaeJaeyOeI0IaemOqaiKaem4uamLaemOuai1aaSbaaSqaaiabdMgaPjabdQgaQbqabaaakiaawIcacaGLPaaadaahaaWcbeqaaiabikdaYaaaiiGakiab=v7aLnaaBaaaleaacqWGPbqAcqWGQbGAaeqaaaqaaiabdQgaQjabg2da9iabdMgaPjabgUcaRiabigdaXaqaaiabd6eaobqdcqGHris5aaWcbaGaemyAaKMaeyypa0JaeGymaedabaGaemOta4KaeyOeI0IaeGymaedaniabggHiLdaaaaaa@78A3@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where,</p>
            <p>BSR<sub>ij </sub>= the BSR value between the i<sup>th </sup>and the jth peptide fragments.</p>
            <p><it>&#949;</it><sub>ij </sub>= 1 if i and j are in the same community, otherwise it is 0.</p>
            <p>N = total number of peptide fragments</p>
            <p>The sum-squared error among communities (SSA) can be calculated as SS<sub>A </sub>= SS<sub>T</sub>-SS<sub>W</sub>. Significance was determined by randomizing the assignment of sequences to the sequence collections and recalculating the statistic and determining the proportion of randomizations resulting in an equal or smaller SS<sub>W </sub>value than that observed from the randomized distribution <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>OPF-based comparisons of community membership and structure</p>
            </st>
            <p>Using the frequency that each OPF was observed in multiple communities, it has been possible to estimate the number of OPFs that are shared between communities as well as describe the overlap between community structures. Analogous to the Chao1 non-parametric richness estimator <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>, Chao et al. <abbrgrp><abbr bid="B49">49</abbr></abbrgrp> derived a non-parametric estimator of the richness shared between two communities:</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-9-34-i3" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>S</m:mi>
                              <m:mrow>
                                 <m:mi>A</m:mi>
                                 <m:mo>,</m:mo>
                                 <m:mi>B</m:mi>
                                 <m:mtext>&#160;</m:mtext>
                                 <m:mi>C</m:mi>
                                 <m:mi>h</m:mi>
                                 <m:mi>a</m:mi>
                                 <m:mi>o</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mo>=</m:mo>
                           <m:msub>
                              <m:mi>S</m:mi>
                              <m:mrow>
                                 <m:mn>12</m:mn>
                              </m:mrow>
                           </m:msub>
                           <m:mo>+</m:mo>
                           <m:msub>
                              <m:mi>f</m:mi>
                              <m:mrow>
                                 <m:mn>11</m:mn>
                              </m:mrow>
                           </m:msub>
                           <m:mfrac>
                              <m:mrow>
                                 <m:msub>
                                    <m:mi>f</m:mi>
                                    <m:mrow>
                                       <m:mn>1</m:mn>
                                       <m:mo>+</m:mo>
                                    </m:mrow>
                                 </m:msub>
                                 <m:msub>
                                    <m:mi>f</m:mi>
                                    <m:mrow>
                                       <m:mo>+</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                 </m:msub>
                              </m:mrow>
                              <m:mrow>
                                 <m:mn>4</m:mn>
                                 <m:msub>
                                    <m:mi>f</m:mi>
                                    <m:mrow>
                                       <m:mn>2</m:mn>
                                       <m:mo>+</m:mo>
                                    </m:mrow>
                                 </m:msub>
                                 <m:msub>
                                    <m:mi>f</m:mi>
                                    <m:mrow>
                                       <m:mo>+</m:mo>
                                       <m:mn>2</m:mn>
                                    </m:mrow>
                                 </m:msub>
                              </m:mrow>
                           </m:mfrac>
                           <m:mo>+</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:msubsup>
                                    <m:mi>f</m:mi>
                                    <m:mrow>
                                       <m:mn>1</m:mn>
                                       <m:mo>+</m:mo>
                                    </m:mrow>
                                    <m:mn>2</m:mn>
                                 </m:msubsup>
                              </m:mrow>
                              <m:mrow>
                                 <m:mn>2</m:mn>
                                 <m:msub>
                                    <m:mi>f</m:mi>
                                    <m:mrow>
                                       <m:mn>2</m:mn>
                                       <m:mo>+</m:mo>
                                    </m:mrow>
                                 </m:msub>
                              </m:mrow>
                           </m:mfrac>
                           <m:mo>+</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:msubsup>
                                    <m:mi>f</m:mi>
                                    <m:mrow>
                                       <m:mo>+</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                    <m:mn>2</m:mn>
                                 </m:msubsup>
                              </m:mrow>
                              <m:mrow>
                                 <m:mn>2</m:mn>
                                 <m:msub>
                                    <m:mi>f</m:mi>
                                    <m:mrow>
                                       <m:mo>+</m:mo>
                                       <m:mn>2</m:mn>
                                    </m:mrow>
                                 </m:msub>
                              </m:mrow>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaem4uam1aaSbaaSqaaiabdgeabjabcYcaSiabdkeacjabbccaGiabdoeadjabdIgaOjabdggaHjabd+gaVbqabaGccqGH9aqpcqWGtbWudaWgaaWcbaGaeGymaeJaeGOmaidabeaakiabgUcaRiabdAgaMnaaBaaaleaacqaIXaqmcqaIXaqmaeqaaKqbaoaalaaabaGaemOzay2aaSbaaeaacqaIXaqmcqGHRaWkaeqaaiabdAgaMnaaBaaabaGaey4kaSIaeGymaedabeaaaeaacqaI0aancqWGMbGzdaWgaaqaaiabikdaYiabgUcaRaqabaGaemOzay2aaSbaaeaacqGHRaWkcqaIYaGmaeqaaaaakiabgUcaRKqbaoaalaaabaGaemOzay2aa0baaeaacqaIXaqmcqGHRaWkaeaacqaIYaGmaaaabaGaeGOmaiJaemOzay2aaSbaaeaacqaIYaGmcqGHRaWkaeqaaaaakiabgUcaRKqbaoaalaaabaGaemOzay2aa0baaeaacqGHRaWkcqaIXaqmaeaacqaIYaGmaaaabaGaeGOmaiJaemOzay2aaSbaaeaacqGHRaWkcqaIYaGmaeqaaaaaaaa@61A9@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where,</p>
            <p>S<sub>12 </sub>= number of shared OPFs in A and B</p>
            <p>f<sub>11 </sub>= number of shared OPFs with one observed individual in A and B</p>
            <p>f<sub>1+</sub>, f<sub>2+ </sub>= number of shared OPFs with one or two individuals observed in A</p>
            <p>f<sub>+1</sub>, f<sub>+2 </sub>= number of shared OPFs with one or two individuals observed in B</p>
            <p>By a similar approach the fraction of individuals or peptide fragments that belong to a shared OPF can be estimated <abbrgrp><abbr bid="B50">50</abbr><abbr bid="B51">51</abbr></abbrgrp>:</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-9-34-i4" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mtable columnalign="left">
                              <m:mtr columnalign="left">
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>U</m:mi>
                                          <m:mrow>
                                             <m:mi>e</m:mi>
                                             <m:mi>s</m:mi>
                                             <m:mi>t</m:mi>
                                          </m:mrow>
                                       </m:msub>
                                       <m:mo>=</m:mo>
                                       <m:mstyle displaystyle="true">
                                          <m:munderover>
                                             <m:mo>&#8721;</m:mo>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mo>=</m:mo>
                                                <m:mn>1</m:mn>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:msub>
                                                   <m:mi>S</m:mi>
                                                   <m:mrow>
                                                      <m:mn>12</m:mn>
                                                   </m:mrow>
                                                </m:msub>
                                             </m:mrow>
                                          </m:munderover>
                                          <m:mrow>
                                             <m:mfrac>
                                                <m:mrow>
                                                   <m:msub>
                                                      <m:mi>X</m:mi>
                                                      <m:mi>i</m:mi>
                                                   </m:msub>
                                                </m:mrow>
                                                <m:mrow>
                                                   <m:msub>
                                                      <m:mi>n</m:mi>
                                                      <m:mrow>
                                                         <m:mi>t</m:mi>
                                                         <m:mi>o</m:mi>
                                                         <m:mi>t</m:mi>
                                                         <m:mi>a</m:mi>
                                                         <m:mi>l</m:mi>
                                                      </m:mrow>
                                                   </m:msub>
                                                </m:mrow>
                                             </m:mfrac>
                                          </m:mrow>
                                       </m:mstyle>
                                       <m:mo>+</m:mo>
                                       <m:mfrac>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>m</m:mi>
                                                <m:mrow>
                                                   <m:mi>t</m:mi>
                                                   <m:mi>o</m:mi>
                                                   <m:mi>t</m:mi>
                                                   <m:mi>a</m:mi>
                                                   <m:mi>l</m:mi>
                                                </m:mrow>
                                             </m:msub>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mn>1</m:mn>
                                          </m:mrow>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>m</m:mi>
                                                <m:mrow>
                                                   <m:mi>t</m:mi>
                                                   <m:mi>o</m:mi>
                                                   <m:mi>t</m:mi>
                                                   <m:mi>a</m:mi>
                                                   <m:mi>l</m:mi>
                                                </m:mrow>
                                             </m:msub>
                                          </m:mrow>
                                       </m:mfrac>
                                       <m:mfrac>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>f</m:mi>
                                                <m:mrow>
                                                   <m:mo>+</m:mo>
                                                   <m:mn>1</m:mn>
                                                </m:mrow>
                                             </m:msub>
                                          </m:mrow>
                                          <m:mrow>
                                             <m:mn>2</m:mn>
                                             <m:msub>
                                                <m:mi>f</m:mi>
                                                <m:mrow>
                                                   <m:mo>+</m:mo>
                                                   <m:mn>2</m:mn>
                                                </m:mrow>
                                             </m:msub>
                                          </m:mrow>
                                       </m:mfrac>
                                       <m:mstyle displaystyle="true">
                                          <m:munderover>
                                             <m:mo>&#8721;</m:mo>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mo>=</m:mo>
                                                <m:mn>1</m:mn>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:msub>
                                                   <m:mi>S</m:mi>
                                                   <m:mrow>
                                                      <m:mn>12</m:mn>
                                                   </m:mrow>
                                                </m:msub>
                                             </m:mrow>
                                          </m:munderover>
                                          <m:mrow>
                                             <m:mfrac>
                                                <m:mrow>
                                                   <m:msub>
                                                      <m:mi>X</m:mi>
                                                      <m:mi>i</m:mi>
                                                   </m:msub>
                                                </m:mrow>
                                                <m:mrow>
                                                   <m:msub>
                                                      <m:mi>n</m:mi>
                                                      <m:mrow>
                                                         <m:mi>t</m:mi>
                                                         <m:mi>o</m:mi>
                                                         <m:mi>t</m:mi>
                                                         <m:mi>a</m:mi>
                                                         <m:mi>l</m:mi>
                                                      </m:mrow>
                                                   </m:msub>
                                                </m:mrow>
                                             </m:mfrac>
                                          </m:mrow>
                                       </m:mstyle>
                                       <m:mi>I</m:mi>
                                       <m:mrow>
                                          <m:mo>(</m:mo>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>Y</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                             <m:mo>=</m:mo>
                                             <m:mn>1</m:mn>
                                          </m:mrow>
                                          <m:mo>)</m:mo>
                                       </m:mrow>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                              <m:mtr columnalign="left">
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>V</m:mi>
                                          <m:mrow>
                                             <m:mi>e</m:mi>
                                             <m:mi>s</m:mi>
                                             <m:mi>t</m:mi>
                                          </m:mrow>
                                       </m:msub>
                                       <m:mo>=</m:mo>
                                       <m:mstyle displaystyle="true">
                                          <m:munderover>
                                             <m:mo>&#8721;</m:mo>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mo>=</m:mo>
                                                <m:mn>1</m:mn>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:msub>
                                                   <m:mi>S</m:mi>
                                                   <m:mrow>
                                                      <m:mn>12</m:mn>
                                                   </m:mrow>
                                                </m:msub>
                                             </m:mrow>
                                          </m:munderover>
                                          <m:mrow>
                                             <m:mfrac>
                                                <m:mrow>
                                                   <m:msub>
                                                      <m:mi>Y</m:mi>
                                                      <m:mi>i</m:mi>
                                                   </m:msub>
                                                </m:mrow>
                                                <m:mrow>
                                                   <m:msub>
                                                      <m:mi>m</m:mi>
                                                      <m:mrow>
                                                         <m:mi>t</m:mi>
                                                         <m:mi>o</m:mi>
                                                         <m:mi>t</m:mi>
                                                         <m:mi>a</m:mi>
                                                         <m:mi>l</m:mi>
                                                      </m:mrow>
                                                   </m:msub>
                                                </m:mrow>
                                             </m:mfrac>
                                          </m:mrow>
                                       </m:mstyle>
                                       <m:mo>+</m:mo>
                                       <m:mfrac>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>n</m:mi>
                                                <m:mrow>
                                                   <m:mi>t</m:mi>
                                                   <m:mi>o</m:mi>
                                                   <m:mi>t</m:mi>
                                                   <m:mi>a</m:mi>
                                                   <m:mi>l</m:mi>
                                                </m:mrow>
                                             </m:msub>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mn>1</m:mn>
                                          </m:mrow>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>n</m:mi>
                                                <m:mrow>
                                                   <m:mi>t</m:mi>
                                                   <m:mi>o</m:mi>
                                                   <m:mi>t</m:mi>
                                                   <m:mi>a</m:mi>
                                                   <m:mi>l</m:mi>
                                                </m:mrow>
                                             </m:msub>
                                          </m:mrow>
                                       </m:mfrac>
                                       <m:mfrac>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>f</m:mi>
                                                <m:mrow>
                                                   <m:mn>1</m:mn>
                                                   <m:mo>+</m:mo>
                                                </m:mrow>
                                             </m:msub>
                                          </m:mrow>
                                          <m:mrow>
                                             <m:mn>2</m:mn>
                                             <m:msub>
                                                <m:mi>f</m:mi>
                                                <m:mrow>
                                                   <m:mn>2</m:mn>
                                                   <m:mo>+</m:mo>
                                                </m:mrow>
                                             </m:msub>
                                          </m:mrow>
                                       </m:mfrac>
                                       <m:mstyle displaystyle="true">
                                          <m:munderover>
                                             <m:mo>&#8721;</m:mo>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mo>=</m:mo>
                                                <m:mn>1</m:mn>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:msub>
                                                   <m:mi>S</m:mi>
                                                   <m:mrow>
                                                      <m:mn>12</m:mn>
                                                   </m:mrow>
                                                </m:msub>
                                             </m:mrow>
                                          </m:munderover>
                                          <m:mrow>
                                             <m:mfrac>
                                                <m:mrow>
                                                   <m:msub>
                                                      <m:mi>Y</m:mi>
                                                      <m:mi>i</m:mi>
                                                   </m:msub>
                                                </m:mrow>
                                                <m:mrow>
                                                   <m:msub>
                                                      <m:mi>m</m:mi>
                                                      <m:mrow>
                                                         <m:mi>t</m:mi>
                                                         <m:mi>o</m:mi>
                                                         <m:mi>t</m:mi>
                                                         <m:mi>a</m:mi>
                                                         <m:mi>l</m:mi>
                                                      </m:mrow>
                                                   </m:msub>
                                                </m:mrow>
                                             </m:mfrac>
                                          </m:mrow>
                                       </m:mstyle>
                                       <m:mi>I</m:mi>
                                       <m:mrow>
                                          <m:mo>(</m:mo>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>X</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                             <m:mo>=</m:mo>
                                             <m:mn>1</m:mn>
                                          </m:mrow>
                                          <m:mo>)</m:mo>
                                       </m:mrow>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                           </m:mtable>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaqbaeaabiqaaaqaaiabdwfavnaaBaaaleaacqWGLbqzcqWGZbWCcqWG0baDaeqaaOGaeyypa0ZaaabCaKqbagaadaWcaaqaaiabdIfaynaaBaaabaGaemyAaKgabeaaaeaacqWGUbGBdaWgaaqaaiabdsha0jabd+gaVjabdsha0jabdggaHjabdYgaSbqabaaaaaWcbaGaemyAaKMaeyypa0JaeGymaedabaGaem4uam1aaSbaaWqaaiabigdaXiabikdaYaqabaaaniabggHiLdGccqGHRaWkjuaGdaWcaaqaaiabd2gaTnaaBaaabaGaemiDaqNaem4Ba8MaemiDaqNaemyyaeMaemiBaWgabeaacqGHsislcqaIXaqmaeaacqWGTbqBdaWgaaqaaiabdsha0jabd+gaVjabdsha0jabdggaHjabdYgaSbqabaaaamaalaaabaGaemOzay2aaSbaaeaacqGHRaWkcqaIXaqmaeqaaaqaaiabikdaYiabdAgaMnaaBaaabaGaey4kaSIaeGOmaidabeaaaaGcdaaeWbqcfayaamaalaaabaGaemiwaG1aaSbaaeaacqWGPbqAaeqaaaqaaiabd6gaUnaaBaaabaGaemiDaqNaem4Ba8MaemiDaqNaemyyaeMaemiBaWgabeaaaaaaleaacqWGPbqAcqGH9aqpcqaIXaqmaeaacqWGtbWudaWgaaadbaGaeGymaeJaeGOmaidabeaaa0GaeyyeIuoakiabdMeajnaabmaabaGaemywaK1aaSbaaSqaaiabdMgaPbqabaGccqGH9aqpcqaIXaqmaiaawIcacaGLPaaaaeaacqWGwbGvdaWgaaWcbaGaemyzauMaem4CamNaemiDaqhabeaakiabg2da9maaqahajuaGbaWaaSaaaeaacqWGzbqwdaWgaaqaaiabdMgaPbqabaaabaGaemyBa02aaSbaaeaacqWG0baDcqWGVbWBcqWG0baDcqWGHbqycqWGSbaBaeqaaaaaaSqaaiabdMgaPjabg2da9iabigdaXaqaaiabdofatnaaBaaameaacqaIXaqmcqaIYaGmaeqaaaqdcqGHris5aOGaey4kaSscfa4aaSaaaeaacqWGUbGBdaWgaaqaaiabdsha0jabd+gaVjabdsha0jabdggaHjabdYgaSbqabaGaeyOeI0IaeGymaedabaGaemOBa42aaSbaaeaacqWG0baDcqWGVbWBcqWG0baDcqWGHbqycqWGSbaBaeqaaaaadaWcaaqaaiabdAgaMnaaBaaabaGaeGymaeJaey4kaScabeaaaeaacqaIYaGmcqWGMbGzdaWgaaqaaiabikdaYiabgUcaRaqabaaaaOWaaabCaKqbagaadaWcaaqaaiabdMfaznaaBaaabaGaemyAaKgabeaaaeaacqWGTbqBdaWgaaqaaiabdsha0jabd+gaVjabdsha0jabdggaHjabdYgaSbqabaaaaaWcbaGaemyAaKMaeyypa0JaeGymaedabaGaem4uam1aaSbaaWqaaiabigdaXiabikdaYaqabaaaniabggHiLdGccqWGjbqsdaqadaqaaiabdIfaynaaBaaaleaacqWGPbqAaeqaaOGaeyypa0JaeGymaedacaGLOaGaayzkaaaaaaaa@D2BF@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where,</p>
            <p>U<sub>est</sub>, V<sub>est </sub>= fraction of sequences from A and B that belong to a shared OTU</p>
            <p>X<sub>i</sub>, Y<sub>i </sub>= abundance of the i<sup>th </sup>shared OTU in A and B</p>
            <p>n<sub>total</sub>, m<sub>total </sub>= total number of sequences sampled in A and B</p>
            <p>I(&#183;) = if the argument, &#183;, is true then I(&#183;) is 1; otherwise it is 0.</p>
            <p>U<sub>est </sub>and V<sub>est </sub>can then be used to estimate an abundance-based Jaccard similarity coefficient:</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-9-34-i5" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>J</m:mi>
                              <m:mrow>
                                 <m:mi>a</m:mi>
                                 <m:mi>b</m:mi>
                                 <m:mi>u</m:mi>
                                 <m:mi>n</m:mi>
                                 <m:mi>d</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:msub>
                                    <m:mi>U</m:mi>
                                    <m:mrow>
                                       <m:mi>e</m:mi>
                                       <m:mi>s</m:mi>
                                       <m:mi>t</m:mi>
                                    </m:mrow>
                                 </m:msub>
                                 <m:msub>
                                    <m:mi>V</m:mi>
                                    <m:mrow>
                                       <m:mi>e</m:mi>
                                       <m:mi>s</m:mi>
                                       <m:mi>t</m:mi>
                                    </m:mrow>
                                 </m:msub>
                              </m:mrow>
                              <m:mrow>
                                 <m:msub>
                                    <m:mi>U</m:mi>
                                    <m:mrow>
                                       <m:mi>e</m:mi>
                                       <m:mi>s</m:mi>
                                       <m:mi>t</m:mi>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo>+</m:mo>
                                 <m:msub>
                                    <m:mi>V</m:mi>
                                    <m:mrow>
                                       <m:mi>e</m:mi>
                                       <m:mi>s</m:mi>
                                       <m:mi>t</m:mi>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo>&#8722;</m:mo>
                                 <m:msub>
                                    <m:mi>U</m:mi>
                                    <m:mrow>
                                       <m:mi>e</m:mi>
                                       <m:mi>s</m:mi>
                                       <m:mi>t</m:mi>
                                    </m:mrow>
                                 </m:msub>
                                 <m:msub>
                                    <m:mi>V</m:mi>
                                    <m:mrow>
                                       <m:mi>e</m:mi>
                                       <m:mi>s</m:mi>
                                       <m:mi>t</m:mi>
                                    </m:mrow>
                                 </m:msub>
                              </m:mrow>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemOsaO0aaSbaaSqaaiabdggaHjabdkgaIjabdwha1jabd6gaUjabdsgaKbqabaGccqGH9aqpjuaGdaWcaaqaaiabdwfavnaaBaaabaGaemyzauMaem4CamNaemiDaqhabeaacqWGwbGvdaWgaaqaaiabdwgaLjabdohaZjabdsha0bqabaaabaGaemyvau1aaSbaaeaacqWGLbqzcqWGZbWCcqWG0baDaeqaaiabgUcaRiabdAfawnaaBaaabaGaemyzauMaem4CamNaemiDaqhabeaacqGHsislcqWGvbqvdaWgaaqaaiabdwgaLjabdohaZjabdsha0bqabaGaemOvay1aaSbaaeaacqWGLbqzcqWGZbWCcqWG0baDaeqaaaaaaaa@58D8@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>To incorporate into the measure of community similarity the proportion of peptide fragments in each OPF, Yue and Clayton <abbrgrp><abbr bid="B52">52</abbr></abbrgrp> developed the parameter <it>&#952;</it>:</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-9-34-i6" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>&#952;</m:mi>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:mstyle displaystyle="true">
                                    <m:munderover>
                                       <m:mo>&#8721;</m:mo>
                                       <m:mrow>
                                          <m:mi>i</m:mi>
                                          <m:mo>=</m:mo>
                                          <m:mn>1</m:mn>
                                       </m:mrow>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>S</m:mi>
                                             <m:mrow>
                                                <m:mn>12</m:mn>
                                             </m:mrow>
                                          </m:msub>
                                       </m:mrow>
                                    </m:munderover>
                                    <m:mrow>
                                       <m:mfrac>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>X</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                          </m:mrow>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>n</m:mi>
                                                <m:mrow>
                                                   <m:mi>t</m:mi>
                                                   <m:mi>o</m:mi>
                                                   <m:mi>t</m:mi>
                                                   <m:mi>a</m:mi>
                                                   <m:mi>l</m:mi>
                                                </m:mrow>
                                             </m:msub>
                                          </m:mrow>
                                       </m:mfrac>
                                       <m:mfrac>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>Y</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                          </m:mrow>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>m</m:mi>
                                                <m:mrow>
                                                   <m:mi>t</m:mi>
                                                   <m:mi>o</m:mi>
                                                   <m:mi>t</m:mi>
                                                   <m:mi>a</m:mi>
                                                   <m:mi>l</m:mi>
                                                </m:mrow>
                                             </m:msub>
                                          </m:mrow>
                                       </m:mfrac>
                                    </m:mrow>
                                 </m:mstyle>
                              </m:mrow>
                              <m:mrow>
                                 <m:mstyle displaystyle="true">
                                    <m:munderover>
                                       <m:mo>&#8721;</m:mo>
                                       <m:mrow>
                                          <m:mi>i</m:mi>
                                          <m:mo>=</m:mo>
                                          <m:mn>1</m:mn>
                                       </m:mrow>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>S</m:mi>
                                             <m:mn>1</m:mn>
                                          </m:msub>
                                       </m:mrow>
                                    </m:munderover>
                                    <m:mrow>
                                       <m:msup>
                                          <m:mrow>
                                             <m:mrow>
                                                <m:mo>(</m:mo>
                                                <m:mrow>
                                                   <m:mfrac>
                                                      <m:mrow>
                                                         <m:msub>
                                                            <m:mi>X</m:mi>
                                                            <m:mi>i</m:mi>
                                                         </m:msub>
                                                      </m:mrow>
                                                      <m:mrow>
                                                         <m:msub>
                                                            <m:mi>n</m:mi>
                                                            <m:mrow>
                                                               <m:mi>t</m:mi>
                                                               <m:mi>o</m:mi>
                                                               <m:mi>t</m:mi>
                                                               <m:mi>a</m:mi>
                                                               <m:mi>l</m:mi>
                                                            </m:mrow>
                                                         </m:msub>
                                                      </m:mrow>
                                                   </m:mfrac>
                                                </m:mrow>
                                                <m:mo>)</m:mo>
                                             </m:mrow>
                                          </m:mrow>
                                          <m:mn>2</m:mn>
                                       </m:msup>
                                       <m:mo>+</m:mo>
                                    </m:mrow>
                                 </m:mstyle>
                                 <m:mstyle displaystyle="true">
                                    <m:munderover>
                                       <m:mo>&#8721;</m:mo>
                                       <m:mrow>
                                          <m:mi>i</m:mi>
                                          <m:mo>=</m:mo>
                                          <m:mn>1</m:mn>
                                       </m:mrow>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>S</m:mi>
                                             <m:mn>2</m:mn>
                                          </m:msub>
                                       </m:mrow>
                                    </m:munderover>
                                    <m:mrow>
                                       <m:msup>
                                          <m:mrow>
                                             <m:mrow>
                                                <m:mo>(</m:mo>
                                                <m:mrow>
                                                   <m:mfrac>
                                                      <m:mrow>
                                                         <m:msub>
                                                            <m:mi>Y</m:mi>
                                                            <m:mi>i</m:mi>
                                                         </m:msub>
                                                      </m:mrow>
                                                      <m:mrow>
                                                         <m:msub>
                                                            <m:mi>m</m:mi>
                                                            <m:mrow>
                                                               <m:mi>t</m:mi>
                                                               <m:mi>o</m:mi>
                                                               <m:mi>t</m:mi>
                                                               <m:mi>a</m:mi>
                                                               <m:mi>l</m:mi>
                                                            </m:mrow>
                                                         </m:msub>
                                                      </m:mrow>
                                                   </m:mfrac>
                                                </m:mrow>
                                                <m:mo>)</m:mo>
                                             </m:mrow>
                                          </m:mrow>
                                          <m:mn>2</m:mn>
                                       </m:msup>
                                    </m:mrow>
                                 </m:mstyle>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mstyle displaystyle="true">
                                    <m:munderover>
                                       <m:mo>&#8721;</m:mo>
                                       <m:mrow>
                                          <m:mi>i</m:mi>
                                          <m:mo>=</m:mo>
                                          <m:mn>1</m:mn>
                                       </m:mrow>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>S</m:mi>
                                             <m:mrow>
                                                <m:mn>12</m:mn>
                                             </m:mrow>
                                          </m:msub>
                                       </m:mrow>
                                    </m:munderover>
                                    <m:mrow>
                                       <m:mfrac>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>X</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                          </m:mrow>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>n</m:mi>
                                                <m:mrow>
                                                   <m:mi>t</m:mi>
                                                   <m:mi>o</m:mi>
                                                   <m:mi>t</m:mi>
                                                   <m:mi>a</m:mi>
                                                   <m:mi>l</m:mi>
                                                </m:mrow>
                                             </m:msub>
                                          </m:mrow>
                                       </m:mfrac>
                                       <m:mfrac>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>Y</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                          </m:mrow>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>m</m:mi>
                                                <m:mrow>
                                                   <m:mi>t</m:mi>
                                                   <m:mi>o</m:mi>
                                                   <m:mi>t</m:mi>
                                                   <m:mi>a</m:mi>
                                                   <m:mi>l</m:mi>
                                                </m:mrow>
                                             </m:msub>
                                          </m:mrow>
                                       </m:mfrac>
                                    </m:mrow>
                                 </m:mstyle>
                              </m:mrow>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaacciGae8hUdeNaeyypa0tcfa4aaSaaaeaadaaeWbqaamaalaaabaGaemiwaG1aaSbaaeaacqWGPbqAaeqaaaqaaiabd6gaUnaaBaaabaGaemiDaqNaem4Ba8MaemiDaqNaemyyaeMaemiBaWgabeaaaaWaaSaaaeaacqWGzbqwdaWgaaqaaiabdMgaPbqabaaabaGaemyBa02aaSbaaeaacqWG0baDcqWGVbWBcqWG0baDcqWGHbqycqWGSbaBaeqaaaaaaeaacqWGPbqAcqGH9aqpcqaIXaqmaeaacqWGtbWudaWgaaqaaiabigdaXiabikdaYaqabaaacqGHris5aaqaamaaqahabaWaaeWaaeaadaWcaaqaaiabdIfaynaaBaaabaGaemyAaKgabeaaaeaacqWGUbGBdaWgaaqaaiabdsha0jabd+gaVjabdsha0jabdggaHjabdYgaSbqabaaaaaGaayjkaiaawMcaamaaCaaabeqaaiabikdaYaaacqGHRaWkaeaacqWGPbqAcqGH9aqpcqaIXaqmaeaacqWGtbWudaWgaaqaaiabigdaXaqabaaacqGHris5amaaqahabaWaaeWaaeaadaWcaaqaaiabdMfaznaaBaaabaGaemyAaKgabeaaaeaacqWGTbqBdaWgaaqaaiabdsha0jabd+gaVjabdsha0jabdggaHjabdYgaSbqabaaaaaGaayjkaiaawMcaamaaCaaabeqaaiabikdaYaaaaeaacqWGPbqAcqGH9aqpcqaIXaqmaeaacqWGtbWudaWgaaqaaiabikdaYaqabaaacqGHris5aiabgkHiTmaaqahabaWaaSaaaeaacqWGybawdaWgaaqaaiabdMgaPbqabaaabaGaemOBa42aaSbaaeaacqWG0baDcqWGVbWBcqWG0baDcqWGHbqycqWGSbaBaeqaaaaadaWcaaqaaiabdMfaznaaBaaabaGaemyAaKgabeaaaeaacqWGTbqBdaWgaaqaaiabdsha0jabd+gaVjabdsha0jabdggaHjabdYgaSbqabaaaaaqaaiabdMgaPjabg2da9iabigdaXaqaaiabdofatnaaBaaabaGaeGymaeJaeGOmaidabeaaaiabggHiLdaaaaaa@9ADD@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where,</p>
            <p>S<sub>1 </sub>and S<sub>2 </sub>= observed number of OPFs in each community.</p>
         </sec>
         <sec>
            <st>
               <p>16S rRNA sequence analysis</p>
            </st>
            <p>The three metagenomic sequencing projects were selected because they were accompanied by parallel 16S rRNA sequence collections. We obtained the sequences from the original authors and aligned the sequences using the greengenes website <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>. Aligned sequences were imported to ARB <abbrgrp><abbr bid="B54">54</abbr></abbrgrp> and overlapping sequences were used to construct distance matrices with a Jukes-Cantor correction for multiple substitutions. Distance matrices were analyzed using DOTUR <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>, &#8747;-LIBSHUFF <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>, and MG-AMOVA as described above.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Availability of data and software</p>
         </st>
         <p>MG-DOTUR, MG-LIBSHUFF, MG-AMOVA and all sequence and analysis files are available from the authors' website <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>PDS designed the study, developed the methods and software, analyzed results, and wrote the manuscript. JH analyzed results and wrote the manuscript. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We appreciate the generosity of Susannah Tringe for providing the 16S rRNA sequences and peptide fragment sequences from the AMD, soil, whalebone, and unassembled <it>B. anthracis </it>sequence reads. We are grateful to Gene Tyson for providing the 16S rRNA sequences from the AMD community. This work was supported by a USDA postdoctoral fellowship in Soil Biology to PDS (2003-35107-13856), the NSF Microbial Observatories program (MCB-0132085), the Howard Hughes Medical Institute, and the University of Wisconsin-Madison College of Agricultural and Life Sciences.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Metagenomics: genomic analysis of microbial communities</p>
            </title>
            <aug>
               <au>
                  <snm>Riesenfeld</snm>
                  <fnm>CS</fnm>
               </au>
               <au>
                  <snm>Schloss</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Handelsman</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Annu Rev Genet</source>
            <pubdate>2004</pubdate>
            <volume>38</volume>
            <fpage>525</fpage>
            <lpage>552</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.genet.38.072902.091216</pubid>
                  <pubid idtype="pmpid" link="fulltext">15568985</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Characterization of uncultivated prokaryotes: Isolation and analysis of a 40-kilobase-pair genome fragment front a planktonic marine archaeon</p>
            </title>
            <aug>
               <au>
                  <snm>Stein</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Marsh</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>KY</fnm>
               </au>
               <au>
                  <snm>Shizuya</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>DeLong</snm>
                  <fnm>EF</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>1996</pubdate>
            <volume>178</volume>
            <issue>3</issue>
            <fpage>591</fpage>
            <lpage>599</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">177699</pubid>
                  <pubid idtype="pmpid" link="fulltext">8550487</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Cloning the soil metagenome: a strategy for accessing the genetic and functional diversity of uncultured microorganisms</p>
            </title>
            <aug>
               <au>
                  <snm>Rondon</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>August</snm>
                  <fnm>PR</fnm>
               </au>
               <au>
                  <snm>Bettermann</snm>
                  <fnm>AD</fnm>
               </au>
               <au>
                  <snm>Brady</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Grossman</snm>
                  <fnm>TH</fnm>
               </au>
               <etal/>
            </aug>
            <source>Appl Environ Microbiol</source>
            <pubdate>2000</pubdate>
            <volume>66</volume>
            <issue>6</issue>
            <fpage>2541</fpage>
            <lpage>2547</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">110579</pubid>
                  <pubid idtype="pmpid" link="fulltext">10831436</pubid>
                  <pubid idtype="doi">10.1128/AEM.66.6.2541-2547.2000</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Analysis of a marine picoplankton community by 16S rRNA gene cloning and sequencing</p>
            </title>
            <aug>
               <au>
                  <snm>Schmidt</snm>
                  <fnm>TM</fnm>
               </au>
               <au>
                  <snm>DeLong</snm>
                  <fnm>EF</fnm>
               </au>
               <au>
                  <snm>Pace</snm>
                  <fnm>NR</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>1991</pubdate>
            <volume>173</volume>
            <issue>14</issue>
            <fpage>4371</fpage>
            <lpage>4378</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">208098</pubid>
                  <pubid idtype="pmpid" link="fulltext">2066334</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Metagenomic analyses of an uncultured viral community from human feces</p>
            </title>
            <aug>
               <au>
                  <snm>Breitbart</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hewson</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Felts</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Mahaffy</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Nulton</snm>
                  <fnm>J</fnm>
               </au>
               <etal/>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>2003</pubdate>
            <volume>185</volume>
            <issue>20</issue>
            <fpage>6220</fpage>
            <lpage>6223</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">225035</pubid>
                  <pubid idtype="pmpid" link="fulltext">14526037</pubid>
                  <pubid idtype="doi">10.1128/JB.185.20.6220-6223.2003</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Diversity and population structure of a near-shore marine-sediment viral community</p>
            </title>
            <aug>
               <au>
                  <snm>Breitbart</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Felts</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Kelley</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Mahaffy</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Nulton</snm>
                  <fnm>J</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proc R Soc Lond B Biol Sci</source>
            <pubdate>2004</pubdate>
            <volume>271</volume>
            <issue>1539</issue>
            <fpage>565</fpage>
            <lpage>574</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1098/rspb.2003.2628</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Environmental genome shotgun sequencing of the Sargasso Sea</p>
            </title>
            <aug>
               <au>
                  <snm>Venter</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Remington</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Heidelberg</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Halpern</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Rusch</snm>
                  <fnm>D</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>304</volume>
            <fpage>66</fpage>
            <lpage>74</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1093857</pubid>
                  <pubid idtype="pmpid" link="fulltext">15001713</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Community genomics among stratified microbial assemblages in the ocean's interior</p>
            </title>
            <aug>
               <au>
                  <snm>DeLong</snm>
                  <fnm>EF</fnm>
               </au>
               <au>
                  <snm>Preston</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Mincer</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Rich</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Hallam</snm>
                  <fnm>SJ</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2006</pubdate>
            <volume>311</volume>
            <issue>5760</issue>
            <fpage>496</fpage>
            <lpage>503</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1120250</pubid>
                  <pubid idtype="pmpid" link="fulltext">16439655</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>The Sorcerer II global ocean sampling expedition: Northwest Atlantic through Eastern Tropical Pacific</p>
            </title>
            <aug>
               <au>
                  <snm>Rusch</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Halpern</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Heidelberg</snm>
                  <fnm>KB</fnm>
               </au>
               <au>
                  <snm>Williamson</snm>
                  <fnm>S</fnm>
               </au>
               <etal/>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2007</pubdate>
            <volume>5</volume>
            <issue>3</issue>
            <fpage>e77</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1821060</pubid>
                  <pubid idtype="pmpid" link="fulltext">17355176</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0050077</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>The Sorcerer II global ocean sampling expedition: Expanding the universe of protein families</p>
            </title>
            <aug>
               <au>
                  <snm>Yooseph</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Rusch</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Halpern</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Williamson</snm>
                  <fnm>SJ</fnm>
               </au>
               <etal/>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2007</pubdate>
            <volume>5</volume>
            <issue>3</issue>
            <fpage>e16</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1821046</pubid>
                  <pubid idtype="pmpid" link="fulltext">17355171</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0050016</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Deciphering the evolution and metabolism of an anammox bacterium from a community genome</p>
            </title>
            <aug>
               <au>
                  <snm>Strous</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pelletier</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Mangenot</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Rattei</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Lehner</snm>
                  <fnm>A</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2006</pubdate>
            <volume>440</volume>
            <issue>7085</issue>
            <fpage>790</fpage>
            <lpage>794</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature04647</pubid>
                  <pubid idtype="pmpid" link="fulltext">16598256</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Metagenomic analysis of two enhanced biological phosphorus removal (EBPR) sludge communities</p>
            </title>
            <aug>
               <au>
                  <snm>Garcia Martin</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ivanova</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Kunin</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Warnecke</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Barry</snm>
                  <fnm>KW</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2006</pubdate>
            <volume>24</volume>
            <issue>10</issue>
            <fpage>1263</fpage>
            <lpage>1269</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nbt1247</pubid>
                  <pubid idtype="pmpid" link="fulltext">16998472</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Reverse methanogenesis: testing the hypothesis with environmental genomics</p>
            </title>
            <aug>
               <au>
                  <snm>Hallam</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Putnam</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Preston</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Detter</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Rokhsar</snm>
                  <fnm>D</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>305</volume>
            <issue>5689</issue>
            <fpage>1457</fpage>
            <lpage>1462</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1100025</pubid>
                  <pubid idtype="pmpid" link="fulltext">15353801</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Pathways of carbon assimilation and ammonia oxidation suggested by environmental genomic analyses of marine Crenarchaeota</p>
            </title>
            <aug>
               <au>
                  <snm>Hallam</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Mincer</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Schleper</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Preston</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Roberts</snm>
                  <fnm>K</fnm>
               </au>
               <etal/>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2006</pubdate>
            <volume>4</volume>
            <issue>4</issue>
            <fpage>e95</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1403158</pubid>
                  <pubid idtype="pmpid" link="fulltext">16533068</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0040095</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Community structure and metabolism through reconstruction of microbial genomes from the environment</p>
            </title>
            <aug>
               <au>
                  <snm>Tyson</snm>
                  <fnm>GW</fnm>
               </au>
               <au>
                  <snm>Chapman</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hugenholtz</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Allen</snm>
                  <fnm>EE</fnm>
               </au>
               <au>
                  <snm>Ram</snm>
                  <fnm>RJ</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2004</pubdate>
            <volume>428</volume>
            <issue>6978</issue>
            <fpage>37</fpage>
            <lpage>43</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature02340</pubid>
                  <pubid idtype="pmpid" link="fulltext">14961025</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Symbiosis insights through metagenomic analysis of a microbial consortium</p>
            </title>
            <aug>
               <au>
                  <snm>Woyke</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Teeling</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ivanova</snm>
                  <fnm>NN</fnm>
               </au>
               <au>
                  <snm>Huntemann</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Richter</snm>
                  <fnm>M</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2006</pubdate>
            <volume>443</volume>
            <issue>7114</issue>
            <fpage>950</fpage>
            <lpage>955</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature05192</pubid>
                  <pubid idtype="pmpid" link="fulltext">16980956</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Metagenomic analysis of the human distal gut microbiome</p>
            </title>
            <aug>
               <au>
                  <snm>Gill</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Pop</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Deboy</snm>
                  <fnm>RT</fnm>
               </au>
               <au>
                  <snm>Eckburg</snm>
                  <fnm>PB</fnm>
               </au>
               <au>
                  <snm>Turnbaugh</snm>
                  <fnm>PJ</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2006</pubdate>
            <volume>312</volume>
            <issue>5778</issue>
            <fpage>1355</fpage>
            <lpage>1359</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1124234</pubid>
                  <pubid idtype="pmpid" link="fulltext">16741115</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Comparative metagenomics of microbial communities</p>
            </title>
            <aug>
               <au>
                  <snm>Tringe</snm>
                  <fnm>SG</fnm>
               </au>
               <au>
                  <snm>von Mering</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kobayashi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Salamov</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>K</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2005</pubdate>
            <volume>308</volume>
            <issue>5721</issue>
            <fpage>554</fpage>
            <lpage>557</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1107851</pubid>
                  <pubid idtype="pmpid" link="fulltext">15845853</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Metagenomics for studying unculturable microorganisms: cutting the Gordian knot</p>
            </title>
            <aug>
               <au>
                  <snm>Schloss</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Handelsman</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <issue>8</issue>
            <fpage>229</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1273625</pubid>
                  <pubid idtype="pmpid" link="fulltext">16086859</pubid>
                  <pubid idtype="doi">10.1186/gb-2005-6-8-229</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Use of simulated data sets to evaluate the fidelity of metagenomic processing methods</p>
            </title>
            <aug>
               <au>
                  <snm>Mavromatis</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ivanova</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Barry</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Shapiro</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Goltsman</snm>
                  <fnm>E</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature methods</source>
            <pubdate>2007</pubdate>
            <volume>4</volume>
            <issue>6</issue>
            <fpage>495</fpage>
            <lpage>500</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nmeth1043</pubid>
                  <pubid idtype="pmpid" link="fulltext">17468765</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>MEGAN analysis of metagenomic data</p>
            </title>
            <aug>
               <au>
                  <snm>Huson</snm>
                  <fnm>DH</fnm>
               </au>
               <au>
                  <snm>Auch</snm>
                  <fnm>AF</fnm>
               </au>
               <au>
                  <snm>Qi</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Schuster</snm>
                  <fnm>SC</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2007</pubdate>
            <volume>17</volume>
            <issue>3</issue>
            <fpage>377</fpage>
            <lpage>386</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1800929</pubid>
                  <pubid idtype="pmpid" link="fulltext">17255551</pubid>
                  <pubid idtype="doi">10.1101/gr.5969107</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>An experimental metagenome data management and analysis system</p>
            </title>
            <aug>
               <au>
                  <snm>Markowitz</snm>
                  <fnm>VM</fnm>
               </au>
               <au>
                  <snm>Ivanova</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Palaniappan</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Szeto</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Korzeniewski</snm>
                  <fnm>F</fnm>
               </au>
               <etal/>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <issue>14</issue>
            <fpage>e359</fpage>
            <lpage>367</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btl217</pubid>
                  <pubid idtype="pmpid" link="fulltext">16873494</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>An application of statistics to comparative metagenomics</p>
            </title>
            <aug>
               <au>
                  <snm>Rodriguez-Brito</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Rohwer</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Edwards</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <issue>1</issue>
            <fpage>162</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1473205</pubid>
                  <pubid idtype="pmpid" link="fulltext">16549025</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-7-162</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Accurate phylogenetic classification of variable-length DNA fragments</p>
            </title>
            <aug>
               <au>
                  <snm>McHardy</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Martin</snm>
                  <fnm>HG</fnm>
               </au>
               <au>
                  <snm>Tsirigos</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hugenholtz</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Rigoutsos</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Nature methods</source>
            <pubdate>2007</pubdate>
            <volume>4</volume>
            <issue>1</issue>
            <fpage>63</fpage>
            <lpage>72</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nmeth976</pubid>
                  <pubid idtype="pmpid" link="fulltext">17179938</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Quantitative phylogenetic assessment of microbial communities in diverse environments</p>
            </title>
            <aug>
               <au>
                  <snm>von Mering</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Hugenholtz</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Raes</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Tringe</snm>
                  <fnm>SG</fnm>
               </au>
               <au>
                  <snm>Doerks</snm>
                  <fnm>T</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2007</pubdate>
            <volume>315</volume>
            <issue>5815</issue>
            <fpage>1126</fpage>
            <lpage>1130</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1133420</pubid>
                  <pubid idtype="pmpid" link="fulltext">17272687</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Application of tetranucleotide frequencies for the assignment of genomic fragments</p>
            </title>
            <aug>
               <au>
                  <snm>Teeling</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Meyerdierks</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bauer</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Amann</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Glockner</snm>
                  <fnm>FO</fnm>
               </au>
            </aug>
            <source>Environ Microbiol</source>
            <pubdate>2004</pubdate>
            <volume>6</volume>
            <issue>9</issue>
            <fpage>938</fpage>
            <lpage>947</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1462-2920.2004.00624.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">15305919</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Teeling</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Waldmann</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lombardot</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Bauer</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Glockner</snm>
                  <fnm>FO</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <issue>1</issue>
            <fpage>163</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">529438</pubid>
                  <pubid idtype="pmpid" link="fulltext">15507136</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-5-163</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Comparative analysis of environmental sequences: potential and challenges</p>
            </title>
            <aug>
               <au>
                  <snm>Foerstner</snm>
                  <fnm>KU</fnm>
               </au>
               <au>
                  <snm>von Mering</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Philos Trans R Soc Lond B Biol Sci</source>
            <pubdate>2006</pubdate>
            <volume>361</volume>
            <issue>1467</issue>
            <fpage>519</fpage>
            <lpage>523</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1609345</pubid>
                  <pubid idtype="pmpid" link="fulltext">16524840</pubid>
                  <pubid idtype="doi">10.1098/rstb.2005.1809</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Inference of population genetic parameters in metagenomics: a clean look at messy data</p>
            </title>
            <aug>
               <au>
                  <snm>Johnson</snm>
                  <fnm>PL</fnm>
               </au>
               <au>
                  <snm>Slatkin</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2006</pubdate>
            <volume>16</volume>
            <issue>10</issue>
            <fpage>1320</fpage>
            <lpage>1327</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1581441</pubid>
                  <pubid idtype="pmpid" link="fulltext">16954540</pubid>
                  <pubid idtype="doi">10.1101/gr.5431206</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness</p>
            </title>
            <aug>
               <au>
                  <snm>Schloss</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Handelsman</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Appl Environ Microbiol</source>
            <pubdate>2005</pubdate>
            <volume>71</volume>
            <issue>3</issue>
            <fpage>1501</fpage>
            <lpage>1506</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1065144</pubid>
                  <pubid idtype="pmpid" link="fulltext">15746353</pubid>
                  <pubid idtype="doi">10.1128/AEM.71.3.1501-1506.2005</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Integration of microbial ecology and statistics: a test to compare gene libraries</p>
            </title>
            <aug>
               <au>
                  <snm>Schloss</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Larget</snm>
                  <fnm>BR</fnm>
               </au>
               <au>
                  <snm>Handelsman</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Appl Environ Microbiol</source>
            <pubdate>2004</pubdate>
            <volume>70</volume>
            <fpage>5485</fpage>
            <lpage>5492</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">520927</pubid>
                  <pubid idtype="pmpid" link="fulltext">15345436</pubid>
                  <pubid idtype="doi">10.1128/AEM.70.9.5485-5492.2004</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Quantitative comparisons of 16S rRNA gene sequence libraries from environmental samples</p>
            </title>
            <aug>
               <au>
                  <snm>Singleton</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>Furlong</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Rathbun</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Whitman</snm>
                  <fnm>WB</fnm>
               </au>
            </aug>
            <source>Appl Environ Microbiol</source>
            <pubdate>2001</pubdate>
            <volume>67</volume>
            <issue>9</issue>
            <fpage>4374</fpage>
            <lpage>4376</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">93175</pubid>
                  <pubid idtype="pmpid" link="fulltext">11526051</pubid>
                  <pubid idtype="doi">10.1128/AEM.67.9.4374-4376.2001</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Phylogenetic approaches for describing and comparing the diversity of microbial communities</p>
            </title>
            <aug>
               <au>
                  <snm>Martin</snm>
                  <fnm>AP</fnm>
               </au>
            </aug>
            <source>Appl Environ Microbiol</source>
            <pubdate>2002</pubdate>
            <volume>68</volume>
            <issue>8</issue>
            <fpage>3673</fpage>
            <lpage>3682</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">124012</pubid>
                  <pubid idtype="pmpid" link="fulltext">12147459</pubid>
                  <pubid idtype="doi">10.1128/AEM.68.8.3673-3682.2002</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Visualization of comparative genomic analyses by BLAST score ratio</p>
            </title>
            <aug>
               <au>
                  <snm>Rasko</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>GS</fnm>
               </au>
               <au>
                  <snm>Ravel</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <issue>1</issue>
            <fpage>2</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">545078</pubid>
                  <pubid idtype="pmpid" link="fulltext">15634352</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-6-2</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>From gene trees to organismal phylogeny in prokaryotes: the case of the <it>&#947;</it>-Proteobacteria</p>
            </title>
            <aug>
               <au>
                  <snm>Lerat</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Daubin</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Moran</snm>
                  <fnm>NA</fnm>
               </au>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2003</pubdate>
            <volume>1</volume>
            <issue>1</issue>
            <fpage>E19</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">193605</pubid>
                  <pubid idtype="pmpid" link="fulltext">12975657</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0000019</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>The genome sequence of <it>Bacillus anthracis </it>Ames and comparison to closely related bacteria</p>
            </title>
            <aug>
               <au>
                  <snm>Read</snm>
                  <fnm>TD</fnm>
               </au>
               <au>
                  <snm>Peterson</snm>
                  <fnm>SN</fnm>
               </au>
               <au>
                  <snm>Tourasse</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Baillie</snm>
                  <fnm>LW</fnm>
               </au>
               <au>
                  <snm>Paulsen</snm>
                  <fnm>IT</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2003</pubdate>
            <volume>423</volume>
            <issue>6935</issue>
            <fpage>81</fpage>
            <lpage>86</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature01586</pubid>
                  <pubid idtype="pmpid" link="fulltext">12721629</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Measuring biological diversity</p>
            </title>
            <aug>
               <au>
                  <snm>Magurran</snm>
                  <fnm>AE</fnm>
               </au>
            </aug>
            <publisher>Malden, Ma.: Blackwell Pub</publisher>
            <pubdate>2004</pubdate>
         </bibl>
         <bibl id="B38">
            <title>
               <p>EST clustering error evaluation and correction</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Lindsay</snm>
                  <fnm>BG</fnm>
               </au>
               <au>
                  <snm>Leebens-Mack</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Cui</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Wall</snm>
                  <fnm>K</fnm>
               </au>
               <etal/>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <issue>17</issue>
            <fpage>2973</fpage>
            <lpage>2984</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bth342</pubid>
                  <pubid idtype="pmpid" link="fulltext">15189818</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Toward a census of bacteria in soil</p>
            </title>
            <aug>
               <au>
                  <snm>Schloss</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Handelsman</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>PLoS Comp Biol</source>
            <pubdate>2006</pubdate>
            <volume>2</volume>
            <issue>7</issue>
            <fpage>e92</fpage>
            <xrefbib>
               <pubid idtype="doi">10.1371/journal.pcbi.0020092</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Microbial community genomics in the ocean</p>
            </title>
            <aug>
               <au>
                  <snm>Delong</snm>
                  <fnm>EF</fnm>
               </au>
            </aug>
            <source>Nat Rev Microbiol</source>
            <pubdate>2005</pubdate>
            <volume>3</volume>
            <issue>6</issue>
            <fpage>459</fpage>
            <lpage>469</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrmicro1158</pubid>
                  <pubid idtype="pmpid" link="fulltext">15886695</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Genome sequencing in microfabricated high-density picolitre reactors</p>
            </title>
            <aug>
               <au>
                  <snm>Margulies</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Egholm</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Altman</snm>
                  <fnm>WE</fnm>
               </au>
               <au>
                  <snm>Attiya</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bader</snm>
                  <fnm>JS</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2005</pubdate>
            <volume>437</volume>
            <issue>7057</issue>
            <fpage>376</fpage>
            <lpage>380</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1464427</pubid>
                  <pubid idtype="pmpid" link="fulltext">16056220</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Numerical Ecology</p>
            </title>
            <aug>
               <au>
                  <snm>Legendre</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Legendre</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <publisher>New York: Elsevier</publisher>
            <pubdate>1998</pubdate>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Non-parametric estimation of the number of classes in a population</p>
            </title>
            <aug>
               <au>
                  <snm>Chao</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Scand J Stat</source>
            <pubdate>1984</pubdate>
            <volume>11</volume>
            <issue>4</issue>
            <fpage>265</fpage>
            <lpage>270</lpage>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Estimating the number of classes via sample coverage</p>
            </title>
            <aug>
               <au>
                  <snm>Chao</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>SM</fnm>
               </au>
            </aug>
            <source>J Am Stat Assoc</source>
            <pubdate>1992</pubdate>
            <volume>87</volume>
            <issue>417</issue>
            <fpage>210</fpage>
            <lpage>217</lpage>
            <xrefbib>
               <pubid idtype="doi">10.2307/2290471</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Robust estimation of population size when capture probabilities vary among animals</p>
            </title>
            <aug>
               <au>
                  <snm>Burnham</snm>
                  <fnm>KP</fnm>
               </au>
               <au>
                  <snm>Overton</snm>
                  <fnm>WS</fnm>
               </au>
            </aug>
            <source>Ecology</source>
            <pubdate>1979</pubdate>
            <volume>60</volume>
            <issue>5</issue>
            <fpage>927</fpage>
            <lpage>936</lpage>
            <xrefbib>
               <pubid idtype="doi">10.2307/1936861</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>SPADE</p>
            </title>
            <url>http://chao.stat.nthu.edu.tw/softwareCE.html</url>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data</p>
            </title>
            <aug>
               <au>
                  <snm>Excoffier</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Smouse</snm>
                  <fnm>PE</fnm>
               </au>
               <au>
                  <snm>Quattro</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1992</pubdate>
            <volume>131</volume>
            <issue>2</issue>
            <fpage>479</fpage>
            <lpage>491</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1205020</pubid>
                  <pubid idtype="pmpid" link="fulltext">1644282</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>A new method for non-parametric multivariate analysis of variance</p>
            </title>
            <aug>
               <au>
                  <snm>Anderson</snm>
                  <fnm>MJ</fnm>
               </au>
            </aug>
            <source>Austral Ecol</source>
            <pubdate>2001</pubdate>
            <volume>26</volume>
            <fpage>32</fpage>
            <lpage>46</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1046/j.1442-9993.2001.01070.x</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Estimating the number of shared species in two communities</p>
            </title>
            <aug>
               <au>
                  <snm>Chao</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hwang</snm>
                  <fnm>WH</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>YC</fnm>
               </au>
               <au>
                  <snm>Kuo</snm>
                  <fnm>CY</fnm>
               </au>
            </aug>
            <source>Stat Sinica</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <issue>1</issue>
            <fpage>227</fpage>
            <lpage>246</lpage>
         </bibl>
         <bibl id="B50">
            <title>
               <p>A new statistical approach for assessing similarity of species composition with incidence and abundance data</p>
            </title>
            <aug>
               <au>
                  <snm>Chao</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Chazdon</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Colwell</snm>
                  <fnm>RK</fnm>
               </au>
               <au>
                  <snm>Shen</snm>
                  <fnm>TJ</fnm>
               </au>
            </aug>
            <source>Ecol Lett</source>
            <pubdate>2005</pubdate>
            <volume>8</volume>
            <issue>2</issue>
            <fpage>148</fpage>
            <lpage>159</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1111/j.1461-0248.2004.00707.x</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>Abundance-based similarity indices and their estimation when there are unseen species in samples</p>
            </title>
            <aug>
               <au>
                  <snm>Chao</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Chazdon</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Colwell</snm>
                  <fnm>RK</fnm>
               </au>
               <au>
                  <snm>Shen</snm>
                  <fnm>TJ</fnm>
               </au>
            </aug>
            <source>Biometrics</source>
            <pubdate>2006</pubdate>
            <volume>62</volume>
            <fpage>361</fpage>
            <lpage>371</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1541-0420.2005.00489.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">16918900</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>A similarity measure based on species proportions</p>
            </title>
            <aug>
               <au>
                  <snm>Yue</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Clayton</snm>
                  <fnm>MK</fnm>
               </au>
            </aug>
            <source>Commun Stat Theor M</source>
            <pubdate>2005</pubdate>
            <volume>34</volume>
            <issue>11</issue>
            <fpage>2123</fpage>
            <lpage>2131</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1080/STA-200066418</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>greengenes</p>
            </title>
            <url>http://greengenes.lbl.gov</url>
         </bibl>
         <bibl id="B54">
            <title>
               <p>ARB: A software environment for sequence data</p>
            </title>
            <aug>
               <au>
                  <snm>Ludwig</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Strunk</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Westram</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Richter</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Meier</snm>
                  <fnm>H</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <issue>4</issue>
            <fpage>1363</fpage>
            <lpage>1371</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">390282</pubid>
                  <pubid idtype="pmpid" link="fulltext">14985472</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh293</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>MetaG Toolbox</p>
            </title>
            <url>http://www.bio.umass.edu/micro/schloss/metaG_tools/</url>
         </bibl>
         <bibl id="B56">
            <title>
               <p>Introducing SONS, A tool that compares the membership of microbial communities</p>
            </title>
            <aug>
               <au>
                  <snm>Schloss</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Handelsman</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Appl Environ Microbiol</source>
            <pubdate>2006</pubdate>
            <volume>72</volume>
            <issue>10</issue>
            <fpage>6773</fpage>
            <lpage>6779</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1610290</pubid>
                  <pubid idtype="pmpid" link="fulltext">17021230</pubid>
                  <pubid idtype="doi">10.1128/AEM.00474-06</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>

