<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>1471-2164-13-695</ui>
	<ji>1471-2164</ji>
	<fm>
		<dochead>Methodology article</dochead>
		<bibl>
			<title>
				<p>A novel method to discover fluoroquinolone antibiotic resistance (qnr) genes in fragmented nucleotide sequences</p>
			</title>
			<aug>
				<au id="A1"><snm>Boulund</snm><fnm>Fredrik</fnm><insr iid="I1"/><email>fredrik.boulund@chalmers.se</email></au>
				<au id="A2"><snm>Johnning</snm><fnm>Anna</fnm><insr iid="I2"/><email>anna.johnning@gu.se</email></au>
				<au id="A3"><snm>Pereira</snm><fnm>Mariana Buongermino</fnm><insr iid="I1"/><email>marbuo@chalmers.se</email></au>
				<au id="A4"><snm>Larsson</snm><fnm>DG Joakim</fnm><insr iid="I3"/><email>joakim.larsson@fysiologi.gu.se</email></au>
				<au id="A5" ca="yes"><snm>Kristiansson</snm><fnm>Erik</fnm><insr iid="I1"/><email>erik.kristiansson@chalmers.se</email></au>
			</aug>
			<insg>
				<ins id="I1"><p>Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg, G&#246;teborg, SE-412 96, Sweden</p></ins>
				<ins id="I2"><p>Institute of Neuroscience and Physiology, the Sahlgrenska Academy at the University of Gothenburg, Box 434, G&#246;teborg, SE-405 30, Sweden</p></ins>
				<ins id="I3"><p>Department of Infectious Diseases, Institute of Biomedicine, the Sahlgrenska Academy at the University of Gothenburg, Box 434, G&#246;teborg, SE-405 30, Sweden</p></ins>
			</insg>
			<source>BMC Genomics</source>
			<section><title><p>Prokaryote microbial genomics </p></title></section><issn>1471-2164</issn>
			<pubdate>2012</pubdate>
			<volume>13</volume>
			<issue>1</issue>
			<fpage>695</fpage>
			<url>http://www.biomedcentral.com/1471-2164/13/695</url>
			<xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-13-695</pubid><pubid idtype="pmpid">23231464</pubid></pubidlist></xrefbib>
		</bibl>
		<history><rec><date><day>26</day><month>7</month><year>2012</year></date></rec><acc><date><day>4</day><month>12</month><year>2012</year></date></acc><pub><date><day>11</day><month>12</month><year>2012</year></date></pub></history>
		<cpyrt><year>2012</year><collab>Boulund et al.; licensee BioMed Central Ltd.</collab><note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
		<kwdg>
			<kwd>Metagenomics</kwd>
			<kwd>Antibiotic resistance</kwd>
			<kwd>Fluoroquinolones</kwd>
			<kwd>PMQR</kwd>
			<kwd>Qnr</kwd>
			<kwd>Hidden markov models</kwd>
		</kwdg>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<sec>
					<st>
						<p>Background</p>
					</st><p>Broad-spectrum fluoroquinolone antibiotics are central in modern health care and are used to treat and prevent a wide range of bacterial infections. The recently discovered <it>qnr</it> genes provide a mechanism of resistance with the potential to rapidly spread between bacteria using horizontal gene transfer. As for many antibiotic resistance genes present in pathogens today, <it>qnr</it> genes are hypothesized to originate from environmental bacteria. The vast amount of data generated by shotgun metagenomics can therefore be used to explore the diversity of <it>qnr</it> genes in more detail.</p>
				</sec>
				<sec>
					<st>
						<p>Results</p>
					</st><p>In this paper we describe a new method to identify <it>qnr</it> genes in nucleotide sequence data. We show, using cross-validation, that the method has a high statistical power of correctly classifying sequences from novel classes of <it>qnr</it> genes, even for fragments as short as 100 nucleotides. Based on sequences from public repositories, the method was able to identify all previously reported plasmid-mediated <it>qnr</it> genes. In addition, several fragments from novel putative <it>qnr</it> genes were identified in metagenomes. The method was also able to annotate 39 chromosomal variants of which 11 have previously not been reported in literature.</p>
				</sec>
				<sec>
					<st>
						<p>Conclusions</p>
					</st><p>The method described in this paper significantly improves the sensitivity and specificity of identification and annotation of <it>qnr</it> genes in nucleotide sequence data. The predicted novel putative <it>qnr</it> genes in the metagenomic data support the hypothesis of a large and uncharacterized diversity within this family of resistance genes in environmental bacterial communities. An implementation of the method is freely available at <url>http://bioinformatics.math.chalmers.se/qnr/</url>.</p>
				</sec>
			</sec>
		</abs>
	</fm>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st><p>Antibiotics are one of our most powerful tools for treating and preventing bacterial infections and have since their introduction vastly improved human health and drastically reduced mortality rates. The high use of antibiotics in human and veterinary medicine has however resulted in an accelerated development of multiresistant bacteria <abbrgrp>
					<abbr bid="B1">1</abbr>
					<abbr bid="B2">2</abbr>
				</abbrgrp>. Bacteria can adapt to an antibiotic selection pressure by altering their genome, either by mutations in pre-existing DNA or through the acquisition of resistance genes <abbrgrp>
					<abbr bid="B3">3</abbr>
				</abbrgrp>. Since resistance genes can be horizontally transferred between bacterial cells, antibiotic resistance can rapidly spread within and between bacterial communities <abbrgrp>
					<abbr bid="B4">4</abbr>
					<abbr bid="B5">5</abbr>
					<abbr bid="B6">6</abbr>
				</abbrgrp>. Many types of antibiotics are derived from compounds that are naturally found in the environment and bacteria have developed resistance genes as a protection mechanism. Environmental bacterial communities have therefore been hypothesized to contain a large and unexplored collection of antibiotic resistance genes <abbrgrp>
					<abbr bid="B7">7</abbr>
					<abbr bid="B8">8</abbr>
					<abbr bid="B9">9</abbr>
					<abbr bid="B10">10</abbr>
				</abbrgrp>. Antibiotic resistance genes were present in environmental bacterial communities long before they emerged in human pathogens <abbrgrp>
					<abbr bid="B11">11</abbr>
				</abbrgrp>. As a consequence, many of the antibiotic resistance genes found in clinical settings have been horizontally transferred from environmental bacteria <abbrgrp>
					<abbr bid="B12">12</abbr>
					<abbr bid="B13">13</abbr>
				</abbrgrp>.</p><p>The broad-spectrum fluoroquinolone antibiotics were introduced in the early 1960&#8217;s and are today extensively used in human and veterinary medicine. Fluoroquinolones interacts with the essential bacterial type II topoisomerases (topoisomerase IV and DNA gyrase) and thereby inhibits DNA replication. The most effective fluoroquinolone resistance mechanism is chromosomal mutations in the antibiotic target proteins which confers high levels of resistance in several bacterial species <abbrgrp>
					<abbr bid="B14">14</abbr>
					<abbr bid="B15">15</abbr>
				</abbrgrp>. Recently, a family of mobile fluoroquinolone antibiotic resistance genes called <it>qnr</it> was discovered <abbrgrp>
					<abbr bid="B16">16</abbr>
					<abbr bid="B17">17</abbr>
				</abbrgrp>. These mobile plasmid-mediated quinolone resistance genes (sometimes labeled PMQR) have been grouped into five recognized classes; <it>qnrA</it>, <it>qnrB</it>, <it>qnrC</it>, <it>qnrD</it>, and <it>qnrS</it> and it is currently unknown whether more classes exist. The <it>qnr</it> genes encode proteins that prevent fluoroquinolones from interacting with DNA/type-II-topoisomerase complexes formed during DNA replication, thus preventing fluoroquinolone inhibition. The levels of resistance conferred by <it>qnr</it> genes are generally lower than chromosomal mutations but can reach up to 1 mg/L (minimum inhibitory concentration) depending on the organism and specific antibiotic compound <abbrgrp>
					<abbr bid="B18">18</abbr>
				</abbrgrp>.</p><p>The <it>qnr</it> genes belong to the larger family of pentapeptide repeat proteins (PRP), which are ubiquitously present with more than 500 variants described in all forms of life <abbrgrp>
					<abbr bid="B19">19</abbr>
				</abbrgrp>. All PRPs are characterized by a sequence feature consisting of repeating subunits of five amino acid residues following the form A(D/N)LXX. This repetitive pattern makes PRPs fold into a &#946;-helix that performs a wide range of cellular functions and they are found both membrane bound and in the cytoplasm <abbrgrp>
					<abbr bid="B20">20</abbr>
				</abbrgrp>. For <it>qnr</it> genes the &#946;-helix resembles the structure of the DNA spiral and interacts with type II topoisomerases and thereby prevent fluoroquinolone antibiotics to inhibit the function of the complex <abbrgrp>
					<abbr bid="B21">21</abbr>
					<abbr bid="B22">22</abbr>
				</abbrgrp>. Despite the strong similarity in the repeating amino acid pattern between <it>qnr</it> sequences and other PRPs it is unclear exactly why <it>qnr</it> genes provide resistance to fluoroquinolones.</p><p>Further characterization of <it>qnr</it> genes is necessary to fully understand their function and estimate their diversity. Assuming the presence of antibiotic resistance genes in clinical settings is the result of transfer of mobile genetic elements from the environment, it is natural to search environmental microbial communities to find previously unidentified <it>qnr</it> genes. Recent culture-independent methods such as metagenomics enables unprecedented exploratory analysis of the genetic basis in microbial communities <abbrgrp>
					<abbr bid="B23">23</abbr>
					<abbr bid="B24">24</abbr>
				</abbrgrp>. This is especially true considering that more than 99% of environmental bacterial communities do not submit easily to cultivation and would consequently be missed with sampling and analysis of individual strains <abbrgrp>
					<abbr bid="B25">25</abbr>
					<abbr bid="B26">26</abbr>
				</abbrgrp>. In combination with next-generation DNA sequencing technologies metagenomics provide means for culture-independent studies of bacterial communities at a very high resolution. However, high-throughput sequencing equipment can currently only produce short DNA fragments (typically 75-400 nucleotides long) which substantially limits the sensitivity and specificity of identifying genes such as <it>qnr</it>
				<abbrgrp>
					<abbr bid="B27">27</abbr>
				</abbrgrp>.</p><p>
				<it>In-silico</it> approaches have previously been used to identify novel variants of <it>qnr</it> genes. For example, Fonseca <it>et al.</it> identified <it>qnr</it>VC1 and <it>qnr</it>VC2 in <it>Vibrio cholerae</it> using sequence comparison to existing plasmid-mediated <it>qnr</it> genes <abbrgrp>
					<abbr bid="B28">28</abbr>
				</abbrgrp>. A similar approach was used by Sanches <it>et al.</it> to identify several chromosomal <it>qnr</it> variants, including multiple members of the class <it>Smqnr</it> from <it>Stenotrophomonas maltophilia</it>
				<abbrgrp>
					<abbr bid="B29">29</abbr>
				</abbrgrp>
				<it>,</it> and by Velasco <it>et al.</it> to discover <it>Smaqnr</it> in <it>Serratia marcescens</it>
				<abbrgrp>
					<abbr bid="B30">30</abbr>
				</abbrgrp>. However, all of these studies used sequence alignment tools such as BLAST which do not explicitly make use of the repetitive structure of the <it>qnr</it> genes. Furthermore, none of the previous suggested methods were adapted to short sequence lengths and high volumes of data which makes them inapplicable to sequences from shotgun metagenomics.</p><p>In this paper, we describe a novel method to identify fluoroquinolone antibiotic resistance genes in DNA sequence data. By using hidden Markov models combined with a length-dependent classification rule, the method is able to discriminate between <it>qnr</it> and other pentapeptide repeat proteins not associated with a resistance phenotype. Cross-validation estimated that the method had a high statistical power of detecting fragments of <it>qnr</it> genes in metagenomic data, even at fragment lengths as short as 100 nucleotides. The method was applied to sequence data from various databases and both known and novel putative <it>qnr</it> genes were identified. An implementation of the method is freely available at <url>http://bioinformatics.math.chalmers.se/qnr/</url>.</p>
		</sec>
		<sec>
			<st>
				<p>Results</p>
			</st><p>A hidden Markov model (HMM) was constructed from a multiple sequence alignment of all currently known and experimentally verified plasmid-mediated <it>qnr</it> resistance gene amino acid sequences <abbrgrp>
					<abbr bid="B31">31</abbr>
				</abbrgrp>. Using the database search software HMMER3, we analyzed the empirical bit score distributions produced by applying the HMM to two sources of protein sequence data; a<it>)</it> true <it>qnr</it> fragments, created from randomly fragmented <it>qnr</it> sequences and <it>b)</it> non-<it>qnr</it> fragments, created from pentapeptide repeat protein (PRP) sequences not associated with a fluoroquinolone resistance phenotype (see Methods). To visualize the bit score distributions of fragmented sequences, random fragments of <it>qnr</it> and non-<it>qnr</it> sequences were created for each fragment length between 10 and 210 amino acid residues (i.e. full length <it>qnr</it> sequences) and their scores against the HMM were plotted as a function of fragment length. As indicated by Figure <figr fid="F1">1</figr> true <it>qnr</it> fragments had bit scores that were approximately linear in relation to their fragment length while the bit score distribution of the non-<it>qnr</it> fragments was centered around 33. A two-part linear classification function was therefore introduced to discriminate between true <it>qnr</it> and non-<it>qnr</it> fragments. For fragments up to a length threshold (D), the classification function was linear with an intercept M and slope K. For fragments longer than D, the function used a fixed cutoff <it>C</it>&#8201;=&#8201;<it>K</it>&#8201;&#215;&#8201;<it>D</it>&#8201;+&#8201;<it>M</it> (Figure <figr fid="F1">1A</figr>, Additional files <supplr sid="S1">1</supplr>, <supplr sid="S2">2</supplr>, <supplr sid="S3">3</supplr>, <supplr sid="S4">4</supplr>, <supplr sid="S5">5</supplr>).</p>
			<suppl id="S1">
				<title>
					<p>Additional file 1</p>
				</title>
				<text>
					<p>
						<b>Figure S1.</b> Fragment bit scores with HMM constructed without QnrA. Bit scores of fragments against the hidden Markov model where all sequences from QnrA were excluded.</p>
				</text>
				<file name="1471-2164-13-695-S1.pdf">
   <p>Click here for file</p>
</file>
			</suppl>
			<suppl id="S2">
				<title>
					<p>Additional file 2</p>
				</title>
				<text>
					<p>
						<b>Figure S2.</b> Fragment bit scores with HMM constructed without QnrB. Bit scores of fragments against the hidden Markov model where all sequences from QnrB were excluded.</p>
				</text>
				<file name="1471-2164-13-695-S2.pdf">
   <p>Click here for file</p>
</file>
			</suppl>
			<suppl id="S3">
				<title>
					<p>Additional file 3</p>
				</title>
				<text>
					<p>
						<b>Figure S3.</b> Fragment bit scores with HMM constructed without QnrC. Bit scores of fragments against the hidden Markov model where all sequences from QnrC were excluded.</p>
				</text>
				<file name="1471-2164-13-695-S3.pdf">
   <p>Click here for file</p>
</file>
			</suppl>
			<suppl id="S4">
				<title>
					<p>Additional file 4</p>
				</title>
				<text>
					<p>
						<b>Figure S4.</b> Fragment bit scores with HMM constructed without QnrD. Bit scores of fragments against the hidden Markov model where all sequences from QnrD were excluded.</p>
				</text>
				<file name="1471-2164-13-695-S4.pdf">
   <p>Click here for file</p>
</file>
			</suppl>
			<suppl id="S5">
				<title>
					<p>Additional file 5</p>
				</title>
				<text>
					<p>
						<b>Figure S5.</b> Fragment bit scores with HMM constructed without QnrS. Bit scores of fragments against the hidden Markov model where all sequences from QnrS were excluded.</p>
				</text>
				<file name="1471-2164-13-695-S5.pdf">
   <p>Click here for file</p>
</file>
			</suppl>
			<fig id="F1"><title><p>Figure 1</p></title><caption><p>Fragment bit scores and classification rule</p></caption><text>
   <p><b>Fragment bit scores and classification rule. A</b>) The figure shows the distribution of the fragment bit scores at different fragment lengths. The separation between the <it>qnr</it> fragments (light blue) and non-<it>qnr</it> fragments (light red) increase for longer fragment lengths. The solid blue and red lines show the average bit scores for <it>qnr</it> and non-<it>qnr</it> fragments, with their 99th and 1st percentiles in grey dashed lines above and below, respectively. The thick dashed line in black shows the classification function with the optimized parameters K=0.778, M=-7.89, D=150.64 [see Additional file <supplr sid="S1">1</supplr>: Figure S1, Additional file <supplr sid="S2">2</supplr>: Figure S2, Additional file <supplr sid="S3">3</supplr>: Figure S3, Additional file <supplr sid="S4">4</supplr>: Figure S4, Additional file <supplr sid="S5">5</supplr>: Figure S5 for plots corresponding to each separate class of <it>qnr</it>]. <b>B</b>) The bit scores when compared to the hidden Markov model for 33 amino acid long fragments, corresponding to the approximately 100 nucleotides long sequence reads common in next-generation sequencing technologies. At this fragment length, the <it>qnr</it> fragments (blue) are clearly separated from the non-<it>qnr</it> (red) with only a small overlap.</p>
</text><graphic file="1471-2164-13-695-1"/></fig><p>Cross-validation was used to optimize the parameters M, K and D of the classification function for identification of novel classes of <it>qnr</it> genes. The optimization was performed for five different models where each model was created by excluding one class of plasmid-mediated <it>qnr</it> proteins (i.e. QnrA, QnrB, QnrC, QnrD and QnrS). The cross-validation was then performed using disjoint set of fragments, one for parameter estimation (training) and one for evaluation of the corresponding performance (validation). The training and validation data sets were created from fragments of both <it>qnr</it> genes and non-<it>qnr</it> PRP genes without any associated resistance phenotype. For each of the five models, the excluded <it>qnr</it> class was also removed from the training dataset. The corresponding performance was, on the other hand, evaluated only using the excluded <it>qnr</it> class and a set of non-<it>qnr</it> genes. Thus, the ability to classify novel <it>qnr</it> genes was evaluated on fragments of gene classes not included in the model. The cross-validation was performed with random fragments ranging from 10 to 209 amino acid residues, each length repeated 2500 times. The parameters of the classification function were estimated to M = -7.89 (1.37), K = 0.778 (0.084), D = 150.64 (27.05) (average over all five models, standard deviation in brackets). The corresponding fixed cutoff C was calculated to C = 109.64 (16.40).</p><p>The optimized classification function parameters were then used to validate the statistical power to detect novel putative fragments. At a fragment length as short as 33 amino acids the average power for correctly classifying fragments from novel putative <it>qnr</it> gene classes was 94% (Figure <figr fid="F2">2A</figr>). The results differed between the five models (Figure <figr fid="F2">2B</figr>): for a 33 amino acid long sequence, the power to identify a QnrD fragment (given a model built from QnrA, B, C and S) was highest (99.04%) while the power of identifying a QnrB fragment (given QnrA, C, D and S) was the lowest (88.02%). The specificity was estimated to be above 99.27% for all models and all fragment lengths [Additional file <supplr sid="S6">6</supplr>: Figure S6]. See Methods for full details.</p>
			<suppl id="S6">
				<title>
					<p>Additional file 6</p>
				</title>
				<text>
					<p>
						<b>Figure S6.</b> Specificity. The specificity in classification of fragments of novel qnr genes for each of the five models. The line QnrA denotes the specificity of the model constructed without QnrA to accurately classify fragments from QnrA. The same for QnrB, C, D and S.</p>
				</text>
				<file name="1471-2164-13-695-S6.pdf">
   <p>Click here for file</p>
</file>
			</suppl>
			<fig id="F2"><title><p>Figure 2</p></title><caption><p>Estimated power</p></caption><text>
   <p><b>Estimated power. A</b>) The figure shows the estimated power of detecting fragments from novel classes of <it>qnr</it> as a function of fragment length in nucleotides (averaged over the five different models used in the cross-validation). At a fragment length of 33 amino acids (approximately 100 nucleotides), the power to detect fragments from novel classes of <it>qnr</it> genes was estimated to 94% which increased to 100% for 100 amino acid long fragments. <b>B</b>) A magnification of the upper left region showing the power of detecting each class of <it>qnr</it> genes: QnrA (black), QnrB (red), QnrC (green), QnrD (dark blue) and QnrS (cyan). Corresponding plots for the specificity are available as [Additional file <supplr sid="S6">6</supplr>: Figure S6].</p>
</text><graphic file="1471-2164-13-695-2"/></fig><p>To search for novel putative <it>qnr</it> gene variants a model based on all five classes of plasmid-mediated <it>qnr</it> genes together with the classifier with the optimized parameter values was applied to protein sequences from various databases and metagenomic sequencing projects; GenBank <abbrgrp>
					<abbr bid="B32">32</abbr>
				</abbrgrp>, CAMERA <abbrgrp>
					<abbr bid="B33">33</abbr>
				</abbrgrp>, MG-RAST <abbrgrp>
					<abbr bid="B34">34</abbr>
				</abbrgrp>, contigs from Meta-HIT <abbrgrp>
					<abbr bid="B35">35</abbr>
				</abbrgrp>, and several data sets from SRA <abbrgrp>
					<abbr bid="B36">36</abbr>
				</abbrgrp> (see Table <tblr tid="T1">1</tblr> and Methods). A smaller metagenomic dataset from a recent study where a high abundance of <it>qnr</it> genes was detected was also included <abbrgrp>
					<abbr bid="B37">37</abbr>
				</abbrgrp>. The total number of fragments available in all datasets was 478,025,600 comprising 214,168,682,742 nucleotides. In total, 1733 (3.6&#8201;&#215;&#8201;10<sup>-4</sup>%) sequence fragments classified as <it>qnr</it> by the method. For the metagenomes the proportion of <it>qnr</it> fragments was estimated to 2.8&#8201;&#215;&#8201;10<sup>-4</sup>% (1275 out of 463,364,852 metagenomic fragments), reflecting the low abundance of <it>qnr</it> genes in the environment. All fragments that classified as <it>qnr</it> were stringently clustered into 475 groups (where 165 contained more than one fragments) and annotated against GenBank and the list of known <it>qnr</it> genes <abbrgrp>
					<abbr bid="B31">31</abbr>
				</abbrgrp> [Additional file <supplr sid="S7">7</supplr>: Table S1]. Among these clusters, all five classes of plasmid-mediated <it>qnr</it> were represented as well as 28 previously described chromosomally located variants <abbrgrp>
					<abbr bid="B28">28</abbr>
					<abbr bid="B29">29</abbr>
					<abbr bid="B38">38</abbr>
					<abbr bid="B39">39</abbr>
					<abbr bid="B40">40</abbr>
					<abbr bid="B41">41</abbr>
				</abbrgrp>. In addition, one contig in group #1, which consisted entirely of metagenomic fragments, represented a full length sequence of a novel putative <it>qnr</it> gene with 93% identity (97% similarity) to QnrB1. During the course of this project this sequence was accepted as a novel QnrB variant, QnrB35, and submitted to GenBank [GenBank:AEL00456] <abbrgrp>
					<abbr bid="B39">39</abbr>
				</abbrgrp>. The method was hence capable of reconstructing complete <it>qnr</it> sequences directly from fragmented metagenomic data. This was particularly evident since the complete sequence of QnrB35 was not available in any of the datasets at the time of their retrieval in this project.</p>
			<suppl id="S7">
				<title>
					<p>Additional file 7</p>
				</title>
				<text>
					<p>
						<b>Table S1.</b> Annotation of the 475 groups of sequences discovered in this work.</p>
				</text>
				<file name="1471-2164-13-695-S7.xlsx">
   <p>Click here for file</p>
</file>
			</suppl>
			<table id="T1">
				<title>
					<p>Table 1</p>
				</title>
				<caption>
					<p>
						<b>Data sources searched for </b><b>
							<it>qnr </it>
						</b><b>gene fragments</b>
					</p>
				</caption>
				<tgroup align="left" cols="4">
					<colspec align="left" colname="c1" colnum="1" colwidth="1*"/>
					<colspec align="left" colname="c2" colnum="2" colwidth="1*"/>
					<colspec align="left" colname="c3" colnum="3" colwidth="1*"/>
					<colspec align="left" colname="c4" colnum="4" colwidth="1*"/>
					<thead valign="top">
						<row rowsep="1">
							<entry colname="c1" valign="top">
								<p>
									<b>Data source</b>
								</p>
							</entry>
							<entry colname="c2" valign="top">
								<p>
									<b>Number of sequences</b>
								</p>
							</entry>
							<entry colname="c3" valign="top">
								<p>
									<b>Number of nucleotides </b><b>
										<it>(approximate)</it>
									</b>
								</p>
							</entry>
							<entry colname="c4" valign="top">
								<p>
									<b>Identified putative </b><b>
										<it>qnr </it>
									</b><b>fragments</b>
								</p>
							</entry>
						</row>
					</thead>
					<tbody valign="top">
						<row>
							<entry colname="c1">
								<p>CAMERA <abbrgrp>
										<abbr bid="B33">33</abbr>
									</abbrgrp>
								</p>
							</entry>
							<entry colname="c2">
								<p>161,016,287</p>
							</entry>
							<entry colname="c3">
								<p>57,118,358,119</p>
							</entry>
							<entry colname="c4">
								<p>217</p>
							</entry>
						</row>
						<row>
							<entry colname="c1">
								<p>GenBank (nt) <abbrgrp>
										<abbr bid="B32">32</abbr>
									</abbrgrp>
								</p>
							</entry>
							<entry colname="c2">
								<p>14,627,404</p>
							</entry>
							<entry colname="c3">
								<p>35,003,500,149</p>
							</entry>
							<entry colname="c4">
								<p>392</p>
							</entry>
						</row>
						<row>
							<entry colname="c1">
								<p>GenBank (env_nt) <abbrgrp>
										<abbr bid="B32">32</abbr>
									</abbrgrp>
								</p>
							</entry>
							<entry colname="c2">
								<p>18,438,927</p>
							</entry>
							<entry colname="c3">
								<p>7,602,413,875</p>
							</entry>
							<entry colname="c4">
								<p>54</p>
							</entry>
						</row>
						<row>
							<entry colname="c1">
								<p>GenBank (refseq) <abbrgrp>
										<abbr bid="B32">32</abbr>
									</abbrgrp>
								</p>
							</entry>
							<entry colname="c2">
								<p>33,074</p>
							</entry>
							<entry colname="c3">
								<p>7,192,954,783</p>
							</entry>
							<entry colname="c4">
								<p>66</p>
							</entry>
						</row>
						<row>
							<entry colname="c1">
								<p>Meta-HIT <abbrgrp>
										<abbr bid="B35">35</abbr>
									</abbrgrp>
								</p>
							</entry>
							<entry colname="c2">
								<p>6,589,348</p>
							</entry>
							<entry colname="c3">
								<p>10,322,657,198</p>
							</entry>
							<entry colname="c4">
								<p>2</p>
							</entry>
						</row>
						<row>
							<entry colname="c1">
								<p>MG-RAST <abbrgrp>
										<abbr bid="B42">42</abbr>
									</abbrgrp>
								</p>
							</entry>
							<entry colname="c2">
								<p>74,767,763</p>
							</entry>
							<entry colname="c3">
								<p>29,132,992,517</p>
							</entry>
							<entry colname="c4">
								<p>226</p>
							</entry>
						</row>
						<row>
							<entry colname="c1">
								<p>SRA <abbrgrp>
										<abbr bid="B36">36</abbr>
									</abbrgrp>
								</p>
							</entry>
							<entry colname="c2">
								<p>202,090,286</p>
							</entry>
							<entry colname="c3">
								<p>67,627,717,961</p>
							</entry>
							<entry colname="c4">
								<p>516</p>
							</entry>
						</row>
						<row>
							<entry colname="c1">
								<p>India Patancheru <abbrgrp>
										<abbr bid="B37">37</abbr>
									</abbrgrp>
								</p>
							</entry>
							<entry colname="c2">
								<p>462,241</p>
							</entry>
							<entry colname="c3">
								<p>168,088,140</p>
							</entry>
							<entry colname="c4">
								<p>260</p>
							</entry>
						</row>
						<row rowsep="1">
							<entry colname="c1">
								<p>
									<it>Total:</it>
								</p>
							</entry>
							<entry colname="c2">
								<p>
									<it>478,025,600</it>
								</p>
							</entry>
							<entry colname="c3">
								<p>
									<it>214,168,682,742</it>
								</p>
							</entry>
							<entry colname="c4">
								<p>
									<it>1733</it>
								</p>
							</entry>
						</row>
					</tbody>
				</tgroup>
			</table><p>The method discovered 732 fragments of metagenomic origin that clustered in 440 groups which did not contain any of the previously described plasmid-mediated or chromosomal <it>qnr</it> genes. An additional 11 sequences of novel putative <it>qnr</it> genes in the genomes of 9 sequenced bacteria were also discovered [Additional file <supplr sid="S7">7</supplr>: Table S1]. Table <tblr tid="T2">2</tblr> shows five examples of groups containing sequences classified as novel putative <it>qnr</it> genes by the method. Sequence #1 was constructed from fragments originating from baby stool metagenomes <abbrgrp>
					<abbr bid="B43">43</abbr>
				</abbrgrp> [SRA accession SRX032366] and shared 79% sequence identity with QnrB37. Sequence #2 was discovered in an environmental samples from coastal sea water outside the North American coast [MG-RAST accession 4441580] as a part of the Gene Ocean Sampling Expedition <abbrgrp>
					<abbr bid="B44">44</abbr>
				</abbrgrp>. This sequence is a 218 amino acid long fragment that shares 33% sequence identity with QnrC. The next three sequences were discovered in bacterial genomes in GenBank. Sequence #3 was discovered in the chromosome of <it>Dickey dadantii</it> 3937 [GenBank:NC_014500.1] and was a 213 amino acid long sequence with 68% identity to QnrB28. Sequence #4 was found in the chromosome of <it>Xenorhabdus bovienii</it> [GenBank:NC_013892.1] and was a 211 amino acid long sequence with 66% identity to QnrB19. Sequence #5 came from the chromosome of <it>Vibrio furnissii</it> [GenBank:CP002378.1] and was a 218 amino acid long sequence sharing 72% identity with QnrC. Full results, including all 475 groups and their annotation, are available in [Additional file <supplr sid="S7">7</supplr>: Table S1].</p>
			<table id="T2">
				<title>
					<p>Table 2</p>
				</title>
				<caption>
					<p>
						<b>Examples of identified novel putative </b><b>
							<it>qnr </it>
						</b><b>sequences</b>
					</p>
				</caption>
				<tgroup align="left" cols="6">
					<colspec align="left" colname="c1" colnum="1" colwidth="1*"/>
					<colspec align="left" colname="c2" colnum="2" colwidth="1*"/>
					<colspec align="left" colname="c3" colnum="3" colwidth="1*"/>
					<colspec align="left" colname="c4" colnum="4" colwidth="1*"/>
					<colspec align="left" colname="c5" colnum="5" colwidth="1*"/>
					<colspec align="left" colname="c6" colnum="6" colwidth="1*"/>
					<thead valign="top">
						<row rowsep="1">
							<entry colname="c1">
								<p>
									<b>Example #</b>
								</p>
							</entry>
							<entry colname="c2">
								<p>
									<b>Group</b>
								</p>
							</entry>
							<entry colname="c3">
								<p>
									<b>Source(s)</b>
								</p>
							</entry>
							<entry colname="c4">
								<p>
									<b>Contig length (aa)</b>
								</p>
							</entry>
							<entry colname="c5">
								<p>
									<b>Model bit score</b>
								</p>
							</entry>
							<entry colname="c6">
								<p>
									<b>Most similar plasmid-mediated</b><b>
										<it>qnr</it>
									</b>
								</p>
							</entry>
						</row>
					</thead>
					<tbody valign="top">
						<row>
							<entry colname="c1">
								<p>1</p>
							</entry>
							<entry colname="c2">
								<p>1</p>
							</entry>
							<entry colname="c3">
								<p>Metagenome:</p>
							</entry>
							<entry colname="c4">
								<p>214</p>
							</entry>
							<entry colname="c5">
								<p>356.9</p>
							</entry>
							<entry colname="c6">
								<p>QnrB37 (79% identity)</p>
							</entry>
						</row>
						<row>
							<entry colname="c1"/>
							<entry colname="c2"/>
							<entry colname="c3">
								<p>SRA: SRX032366</p>
							</entry>
							<entry colname="c4"/>
							<entry colname="c5"/>
							<entry colname="c6"/>
						</row>
						<row>
							<entry colname="c1">
								<p>2</p>
							</entry>
							<entry colname="c2">
								<p>12</p>
							</entry>
							<entry colname="c3">
								<p>Metagenome:</p>
							</entry>
							<entry colname="c4">
								<p>218</p>
							</entry>
							<entry colname="c5">
								<p>131.2</p>
							</entry>
							<entry colname="c6">
								<p>QnrC (33% identity)</p>
							</entry>
						</row>
						<row>
							<entry colname="c1"/>
							<entry colname="c2"/>
							<entry colname="c3">
								<p>MG-RAST: 4441580</p>
							</entry>
							<entry colname="c4"/>
							<entry colname="c5"/>
							<entry colname="c6"/>
						</row>
						<row>
							<entry colname="c1">
								<p>3</p>
							</entry>
							<entry colname="c2">
								<p>78</p>
							</entry>
							<entry colname="c3">
								<p>Chromosome:</p>
							</entry>
							<entry colname="c4">
								<p>213</p>
							</entry>
							<entry colname="c5">
								<p>326.4</p>
							</entry>
							<entry colname="c6">
								<p>QnrB28 (68% identity)</p>
							</entry>
						</row>
						<row>
							<entry colname="c1"/>
							<entry colname="c2"/>
							<entry colname="c3">
								<p>Dickeya dadantii 3937, NC_014500.1</p>
							</entry>
							<entry colname="c4"/>
							<entry colname="c5"/>
							<entry colname="c6"/>
						</row>
						<row>
							<entry colname="c1">
								<p>4</p>
							</entry>
							<entry colname="c2">
								<p>81</p>
							</entry>
							<entry colname="c3">
								<p>Chromosome:</p>
							</entry>
							<entry colname="c4">
								<p>211</p>
							</entry>
							<entry colname="c5">
								<p>294.6</p>
							</entry>
							<entry colname="c6">
								<p>Qnr19 (66% identity)</p>
							</entry>
						</row>
						<row>
							<entry colname="c1"/>
							<entry colname="c2"/>
							<entry colname="c3">
								<p>Xenorhabdus bovienii SS-2004, NC_013892.1</p>
							</entry>
							<entry colname="c4"/>
							<entry colname="c5"/>
							<entry colname="c6"/>
						</row>
						<row>
							<entry colname="c1">
								<p>5</p>
							</entry>
							<entry colname="c2">
								<p>199</p>
							</entry>
							<entry colname="c3">
								<p>Chromosome:</p>
							</entry>
							<entry colname="c4">
								<p>218</p>
							</entry>
							<entry colname="c5">
								<p>350.0</p>
							</entry>
							<entry colname="c6">
								<p>QnrC (72% identity)</p>
							</entry>
						</row>
						<row rowsep="1">
							<entry colname="c1"/>
							<entry colname="c2"/>
							<entry colname="c3">
								<p>Vibrio furnissii, CP002378.1</p>
							</entry>
							<entry colname="c4"/>
							<entry colname="c5"/>
							<entry colname="c6"/>
						</row>
					</tbody>
				</tgroup>
			</table>
		</sec>
		<sec>
			<st>
				<p>Discussion</p>
			</st><p>Qnr genes provide resistance to broad-spectrum fluoroquinolone antibiotics and can move between bacteria using horizontal gene transfer. However, the total number of <it>qnr</it> classes and their diversity in environmental bacterial communities is not clear. We therefore developed a novel method to identify new classes of <it>qnr</it> genes in fragmented metagenomic data. The method uses a hidden Markov model (HMM) to identify candidate <it>qnr</it> fragments which are then further classified based on their model score and sequence length. Cross-validation confirmed that the method had a high sensitivity and specificity to detect fragments from novel classes of known <it>qnr</it> genes, even at fragments as short as 33 amino acid residues. This makes the method applicable to many forms of nucleotide data, including sequences generated by next-generation DNA sequencers. From public sequence repositories the method classified 1733 sequence fragments (3.6&#8201;&#215;&#8201;10<sup>-4</sup>%) as <it>qnr</it>, which were further clustered into 475 groups. The method also identified 39 chromosomal <it>qnr</it> variants in 33 bacterial species.</p><p>Several of the novel putative <it>qnr</it> genes identified in this study have to the authors&#8217; best knowledge previously not been described in literature. Experimental verification, including phenotypic profiling in multiple bacterial hosts, is therefore necessary to fully evaluate the resistance potential of our predictions. However, the cross-validation demonstrated that the proposed method had a high sensitivity and could discriminate between fragments from classes of known <it>qnr</it> and pentapeptide repeat proteins without a resistance phenotype (Figure <figr fid="F2">2A</figr>). The method was also able to identify all previously reported classes of <it>qnr</it> genes, including the variant <it>qnr</it>B35 which was at the time for this analysis not submitted to the database and thus not included in the hidden Markov model. This shows that the method has a high predictive power and it is therefore possible that several of the predictions indeed represent previously unidentified novel classes or variants of <it>qnr</it> genes.</p><p>Many of the identified putative <it>qnr</it> gene fragments were discovered in metagenomes sampled from different types of environments, e.g. human gut <abbrgrp>
					<abbr bid="B43">43</abbr>
				</abbrgrp>, seawater <abbrgrp>
					<abbr bid="B44">44</abbr>
				</abbrgrp> and river sediment <abbrgrp>
					<abbr bid="B37">37</abbr>
				</abbrgrp> [see Additional file <supplr sid="S7">7</supplr>: Table S1]. This indicates that there is an unexplored diversity of <it>qnr</it> genes within environmental bacterial communities and that these can be identified by metagenomic sequencing. However, the amount of nucleotide data currently represented in the sequence repositories merely reflects a tiny fraction of the total microbial diversity on earth <abbrgrp>
					<abbr bid="B45">45</abbr>
					<abbr bid="B46">46</abbr>
					<abbr bid="B47">47</abbr>
				</abbrgrp>. In addition, the estimated relative abundance of unknown fragments from putative <it>qnr</it> genes was 2.8&#8201;&#215;&#8201;10-4% (1275 out of 463,364,852 metagenomic fragments) underlining the vast amounts of sequence data needed to identify and assemble <it>qnr</it> genes from environmental data. It is therefore possible, and even likely, that there are additional variants of <it>qnr</it> genes present in the environmental bacterial communities currently not represented in the sequence repositories due to the heavy undersampling. The data that is currently being generated by large-scale reference metagenome projects, such as the Earth Microbiome Project <abbrgrp>
					<abbr bid="B48">48</abbr>
				</abbrgrp> and the Gene Ocean Sampling <abbrgrp>
					<abbr bid="B44">44</abbr>
				</abbrgrp>, will offer a substantially higher sequencing depth and may therefore reveal additional classes and variants of <it>qnr</it> genes.</p><p>Our results show that hidden Markov models are highly suitable for identifying sequence fragments from <it>qnr</it> genes. The model used in this study was derived from a multiple alignment of <it>qnr</it> genes and can thereby infer information on the degree of variability at different amino acid positions in the sequence <abbrgrp>
					<abbr bid="B49">49</abbr>
				</abbrgrp>. This is especially useful for pentapeptide repeat proteins which generally have a low sequence similarity except in the conserved residues of the distinctive repetitive A(D/N)LXX motif. In contrast, traditional sequence alignment tools such as BLAST cannot distinguish between important variation in the repeating pattern and variation in the intermediate regions. Previous methods to identify novel <it>qnr</it> genes from DNA sequence data have used BLAST and may therefore have limited sensitivity and specificity <abbrgrp>
					<abbr bid="B49">49</abbr>
				</abbrgrp>. The proposed method has, on the other hand, demonstrated a high power of detecting new classes of <it>qnr</it> genes (Figure <figr fid="F2">2</figr>) and is hence a more suitable approach for identification and annotation of <it>qnr</it> genes.</p><p>Controlling the number of false predictions is vital for large-scale data analysis. A low specificity can generate a massive amount of false positives and thereby decrease the quality of analysis and the biological interpretation of the downstream results (in this case the sequence groups). Based on the distribution of bit scores for the sequence fragments (Figure <figr fid="F1">1A</figr>) it is clear that a traditional cut-off would not be sufficient to discriminate between <it>qnr</it> and non-<it>qnr</it> PRPs with both a high sensitivity and specificity. Indeed, a single bit score cut-off would have to be set to 75 to minimize false positives across all fragment lengths, effectively removing the ability to classify fragments shorter than 100 amino acid residues (300 nucleotides). Instead, a linear classification function dependent on fragment length for short fragments enabled correct identification while maintaining a high specificity [Additional file <supplr sid="S6">6</supplr>: Figure S6]. This makes the method suitable for analysis of large datasets consisting of short sequence fragments and the method is therefore directly applicable to data from next-generation sequencing technologies such as Illumina&#8217;s sequencing by synthesis, Life Technologies&#8217; sequencing by ligation (SOLiD) or Roche&#8217;s 454 pyrosequencing <abbrgrp>
					<abbr bid="B50">50</abbr>
				</abbrgrp>.</p><p>The hidden Markov model used by the method was created from all known plasmid-mediated <it>qnr</it> genes with experimentally validated resistance phenotype. However, several recent studies have described chromosomally located <it>qnr</it> genes in wide range of species (e.g. <it>Vibrio spp. alginolyticus</it>, <it>Vibrio harveyi</it> and <it>Aeromonas hydrophilia</it>) <abbrgrp>
					<abbr bid="B28">28</abbr>
					<abbr bid="B29">29</abbr>
					<abbr bid="B38">38</abbr>
					<abbr bid="B40">40</abbr>
					<abbr bid="B41">41</abbr>
				</abbrgrp>. These chromosomal <it>qnr</it> genes show a relatively high sequence similarity to their plasmid-mediated relatives and some have been shown to confer resistance towards fluoroquinolones when expressed in <it>E. coli</it> (e.g. SmaQnr and SmQnr) <abbrgrp>
					<abbr bid="B29">29</abbr>
					<abbr bid="B41">41</abbr>
				</abbrgrp>. Their potential to transfer horizontally between bacteria is however not clear. Even though the hidden Markov model was based on plasmid-mediated gene variants, the method demonstrated a high sensitivity to detect <it>qnr</it> genes in bacterial chromosomes. In fact, the method identified 28 previously reported chromosomally located <it>qnr</it> genes in 24 species. In addition, 11 potentially novel chromosomal <it>qnr</it> genes in 9 different species were also identified [Additional file <supplr sid="S7">7</supplr>: Table S1]. Interestingly, four previously suggested chromosomal <it>qnr</it> genes were not classified as such by the method. These genes, which are located in <it>Alkaliphilus metalliredigens, Bacteroides thetaiotaomicron, Bacillus weihenstephanensis</it> and <it>Anabaena variabilis</it> have previously been identified as putative <it>qnr</it> genes using BLAST <abbrgrp>
					<abbr bid="B38">38</abbr>
				</abbrgrp>. However, all these genes share low sequence similarity to other <it>qnr</it> genes and their resistance phenotype has so far not been validated. The four genes received very low scores by our model, which may indicate that these are false predictions and hence not <it>qnr</it> genes. All other previously described <it>qnr</it> genes received high scores by the model and were thus classified as <it>qnr</it>.</p><p>The method described in this study has been implemented as a freely available application in Python. The application searches any specified sequence dataset, classifies the matching sequences as <it>qnr</it> or non-<it>qnr</it> and clusters the results into groups of putative <it>qnr</it> genes [see Additional file <supplr sid="S8">8</supplr>: Figure S7 for an overview]. The implementation is straightforward to use, has been optimized to handle data sizes of the order of terabytes, and is suitable for use on standard desktop computers. The package is documented with internal functions thoroughly commented in the distributed source code, making it possible to interface them directly from related applications. The application can be installed and run on any modern GNU/Linux system and it is available from <url>http://bioinformatics.math.chalmers.se/qnr/</url>.</p>
			<suppl id="S8">
				<title>
					<p>Additional file 8</p>
				</title>
				<text>
					<p>
						<b>Figure S7.</b> Overview of the pipeline implementation. A flowchart describing the major parts of the pipeline implemented in Python.</p>
				</text>
				<file name="1471-2164-13-695-S8.pdf">
   <p>Click here for file</p>
</file>
			</suppl>
		</sec>
		<sec>
			<st>
				<p>Conclusions</p>
			</st><p>In this study we proposed a new method to detect and annotate novel classes of <it>qnr</it> antibiotic resistance genes in nucleotide sequence data. The method uses a hidden Markov model with a fragment length-dependent classification rule and has a high sensitivity and specificity, even for sequences as short at 100 nucleotides. This makes the method directly applicable to the immense amount of data generated by the next-generation DNA sequencing techniques. Based on sequence data currently available in the repositories, the method was able to identify all previously reported plasmid-mediated <it>qnr</it> genes as well as the vast majority of the previously reported chromosomal variants. In addition, the method predicted several novel putative <it>qnr</it> genes and some of these were discovered in shotgun metagenomes, which may indicate a large and unknown diversity of <it>qnr</it> genes in uncultured environmental bacteria.</p>
		</sec>
		<sec>
			<st>
				<p>Methods</p>
			</st><p>A hidden Markov model was based on a multiple sequence alignment of sequences from the reference list of acknowledged and experimentally verified plasmid-mediated <it>qnr</it> genes <abbrgrp>
					<abbr bid="B31">31</abbr>
				</abbrgrp>. Peptide <it>qnr</it> sequences were aligned using MAFFT <abbrgrp>
					<abbr bid="B51">51</abbr>
				</abbrgrp> with default settings. The alignment quality was manually assessed and then used as input for the construction of the hidden Markov model using HMMER3 <abbrgrp>
					<abbr bid="B49">49</abbr>
				</abbrgrp> with default settings.</p><p>Investigation of the empirical bit score distribution of the HMM was performed by drawing random fragments of both <it>qnr</it> and non-<it>qnr</it> genes (Figure <figr fid="F1">1</figr>). This led to the creation of a classifier consisting of a two-part linear discrimination function using information of fragment length (L<sub>f</sub>) and fragment bit score (S<sub>f</sub>) from the hidden Markov model from HMMER. The classifier was defined by three parameters; linear intercept (M), linear slope (K), and long fragment definition (D). A fragment with length, L<sub>f</sub>, and domain bit score, S<sub>f</sub>, was classified as <it>qnr</it> if L<sub>f</sub> &lt; D and S<sub>f</sub> &#8805; K &#8201;&#215;&#8201; L<sub>f</sub> + M, or if L<sub>f</sub> &#8805; D and S<sub>f</sub> &#8805; K &#8201;&#215;&#8201; D + M.</p><p>Cross-validation was used to estimate the parameters and to evaluate the performance of the model. Five different models were created and for each model one class of plasmid-mediated <it>qnr</it> genes was excluded. Two different kinds of sequences were used in the cross-validation: true <it>qnr</it> genes and non-<it>qnr</it> pentapeptide repeat protein sequences. The source of true <it>qnr</it> sequences was the reference list of <it>qnr</it> sequences <abbrgrp>
					<abbr bid="B31">31</abbr>
				</abbrgrp> and the source of non-<it>qnr</it> sequences was sequences from GenBank annotated as pentapeptide repeat proteins (PRP) with the COG1357 annotation, but without a known resistance phenotype.</p><p>Two sets of data were created for each model in the cross-validation; a training and a validation set. The training sets consisted of a combination of true <it>qnr</it> sequences excluding the class which was left out from the model in question and a set of 90 random non-<it>qnr</it> genes. The validation sets contained all known variants of the previously excluded <it>qnr</it> class plus a different set of 421 non-<it>qnr</it> genes. For example, the first model was based on all known plasmid-mediated <it>qnr</it> sequences excluding the sequences from the class <it>qnrA</it>. This model was then applied to training data consisting of true <it>qnr</it> sequences excluding <it>qnrA</it> and a set of non-<it>qnr</it> genes. The classification function was then applied to validation data consisting exclusively of <it>qnrA</it> and a different set of non-<it>qnr</it> sequences where the performance of the model to identify unknown (i.e. novel) classes of plasmid-mediated <it>qnr</it> was estimated. The fragments used in the cross-validation were randomly generated from the training and validation data sets for each model by randomly drawing a <it>qnr</it>/non-<it>qnr</it> fragments with equal probability. For each dataset, 2500 random fragments were created for each fragment length between 10-210 amino acids. A relatively high mutation rate on amino acid sequence was added by randomly substituting each residue for another with the probability of 5% to introduce a substantial amount of noise.</p><p>Parameter values for the classification function were optimized using particle swarm optimization where the parameter spaces for the three parameters were explored (ranges in brackets): M [-20, 30], K [0, 2], and D [30, 210]. Optimization was performed six times using a swarm size of 30 particles with 50 iterations in each run with randomized starting points in parameter space. The objective was to achieve a high true positive rate (TPR) without letting the false positive rate (FPR) becoming too high. The objective function was therefore set to TPR-FPR. The statistical power of the model for identifying novel plasmid-mediated <it>qnr</it> gene variants was computed by using the average parameter values from the six optimization runs when applying the model to the validation data sets (Figure <figr fid="F2">2</figr>).</p><p>The nucleotide datasets used in this project (Table <tblr tid="T1">1</tblr>) were public sequence data sets downloaded in April 2011 (GenBank version 183). Data from the NCBI Sequence Read Archive (SRA) was selected using the search string &#8220;metagenom* AND (454 AND (flx OR titanium)) NOT 16S NOT V6 NOT V9&#8221; which generated 1756 hits at the time data was sourced for this project. The sequence data was first translated into all six reading frames using bacterial translation table 11 in EMBOSS <it>transeq</it>
				<abbrgrp>
					<abbr bid="B52">52</abbr>
				</abbrgrp>. The translated sequences were fed into HMMER3 program <it>hmmsearch</it> to find hits against the model. The only non-default settings used were --notextw and --cpu 8, with no change from default settings for inclusion or reporting thresholds. All hits discovered by HMMER3 were instead subjected to the classification function and hits that classified as <it>qnr</it> were clustered using Blastclust <abbrgrp>
					<abbr bid="B53">53</abbr>
				</abbrgrp>. Clustering parameters used were fragment similarity threshold 90% and minimum length coverage 25%. Cluster groups (containing hits/sequence fragments) were aligned using MAFFT to produce overlapping multiple alignments. The aligned groups were then manually adjusted to identify overlapping fragments that formed longer contigs and complete <it>qnr</it> gene contigs. Finally, such contig sequences were annotated using a combination of the reference <it>qnr</it> compilation <abbrgrp>
					<abbr bid="B31">31</abbr>
				</abbrgrp> and the GenBank data displayed in Table <tblr tid="T1">1</tblr>.</p>
		</sec>
		<sec>
			<st>
				<p>Competing interests</p>
			</st><p>The authors declare no competing interests.</p>
		</sec>
		<sec>
			<st>
				<p>Authors&#8217; contributions</p>
			</st><p>FB, EK and AJ planned the project. FB developed and implemented the method, performed the cross-validation, interpreted the clustering results and annotated the hits. MBP assisted with the implementation of the method. FB and EK drafted the manuscript. The work was supervised by EK, AJ and DGJL. All authors read and approved the final manuscript.</p>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st><p>Thanks to Dr Carl-Fredrik Flach for valuable discussions on the topic of mobile fluoroquinolone resistance. This research was supported by the Life Science Area of Advance at Chalmers University of Technology, Sweden, the Swedish Research Council (VR), the Swedish Research Council for Environment, Agricultural Sciences and Spatial Planning (FORMAS) and the Swedish Society for Medical Research (SSMF). Support from the Gothenburg Bioinformatics Network (GOTBIN) is gratefully acknowledged.</p>
			</sec>
		</ack>
		<refgrp><bibl id="B1"><title><p>Origins and evolution of antibiotic resistance</p></title><aug><au><snm>Davies</snm><fnm>J</fnm></au><au><snm>Davies</snm><fnm>D</fnm></au></aug><source>Microbiol Mol Biol Rev</source><pubdate>2010</pubdate><volume>74</volume><fpage>417</fpage><lpage>433</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/MMBR.00016-10</pubid><pubid idtype="pmcid">2937522</pubid><pubid idtype="pmpid" link="fulltext">20805405</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>The crisis in antibiotic resistance</p></title><aug><au><snm>Neu</snm><fnm>HC</fnm></au></aug><source>Science (New York, N.Y.)</source><pubdate>1992</pubdate><volume>257</volume><fpage>837</fpage><lpage>842</lpage></bibl><bibl id="B3"><title><p>Mutation frequencies and antibiotic resistance</p></title><aug><au><snm>Martinez</snm><fnm>J</fnm></au><au><snm>Baquero</snm><fnm>F</fnm></au></aug><source>Antimicrobial agents</source><pubdate>2000</pubdate><volume>44</volume><fpage>1771</fpage><lpage>1777</lpage><xrefbib><pubid idtype="doi">10.1128/AAC.44.7.1771-1777.2000</pubid></xrefbib></bibl><bibl id="B4"><title><p>Gene flow, mobile genetic elements and the recruitment of antibiotic resistance genes into Gram-negative pathogens</p></title><aug><au><snm>Stokes</snm><fnm>HW</fnm></au><au><snm>Gillings</snm><fnm>MR</fnm></au></aug><source>FEMS Microbiol Rev</source><pubdate>2011</pubdate><volume>35</volume><fpage>790</fpage><lpage>819</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1111/j.1574-6976.2011.00273.x</pubid><pubid idtype="pmpid" link="fulltext">21517914</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>Integrons: mobilizable platforms that promote genetic diversity in bacteria</p></title><aug><au><snm>Boucher</snm><fnm>Y</fnm></au><au><snm>Labbate</snm><fnm>M</fnm></au><au><snm>Koenig</snm><fnm>JE</fnm></au><au><snm>Stokes</snm><fnm>HW</fnm></au></aug><source>Trends Microbiol</source><pubdate>2007</pubdate><volume>15</volume><fpage>301</fpage><lpage>309</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.tim.2007.05.004</pubid><pubid idtype="pmpid" link="fulltext">17566739</pubid></pubidlist></xrefbib></bibl><bibl id="B6"><title><p>The evolution of class 1 integrons and the rise of antibiotic resistance</p></title><aug><au><snm>Gillings</snm><fnm>M</fnm></au><au><snm>Boucher</snm><fnm>Y</fnm></au><au><snm>Labbate</snm><fnm>M</fnm></au><au><snm>Holmes</snm><fnm>A</fnm></au><au><snm>Krishnan</snm><fnm>S</fnm></au><au><snm>Holley</snm><fnm>M</fnm></au><au><snm>Stokes</snm><fnm>HW</fnm></au></aug><source>J Bacteriol</source><pubdate>2008</pubdate><volume>190</volume><fpage>5095</fpage><lpage>5100</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/JB.00152-08</pubid><pubid idtype="pmcid">2447024</pubid><pubid idtype="pmpid" link="fulltext">18487337</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>The antibiotic resistome: the nexus of chemical and genetic diversity</p></title><aug><au><snm>Wright</snm><fnm>GD</fnm></au></aug><source>Nat Rev Microbiol</source><pubdate>2007</pubdate><volume>5</volume><fpage>175</fpage><lpage>186</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nrmicro1614</pubid><pubid idtype="pmpid" link="fulltext">17277795</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>Antibiotic resistance genes from the environment: a perspective through newly identified antibiotic resistance mechanisms in the clinical setting</p></title><aug><au><snm>Cant&#243;n</snm><fnm>R</fnm></au></aug><source>Clin Microbiol Infect</source><pubdate>2009</pubdate><volume>15</volume><issue>Suppl 1</issue><fpage>20</fpage><lpage>25</lpage><xrefbib><pubid idtype="pmpid">19220348</pubid></xrefbib></bibl><bibl id="B9"><title><p>Expanding the soil antibiotic resistome: exploring environmental diversity</p></title><aug><au><snm>D&#8217;Costa</snm><fnm>VM</fnm></au><au><snm>Griffiths</snm><fnm>E</fnm></au><au><snm>Wright</snm><fnm>GD</fnm></au></aug><source>Curr Opin Microbiol</source><pubdate>2007</pubdate><volume>10</volume><fpage>481</fpage><lpage>489</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.mib.2007.08.009</pubid><pubid idtype="pmpid" link="fulltext">17951101</pubid></pubidlist></xrefbib></bibl><bibl id="B10"><title><p>Sampling the antibiotic resistome</p></title><aug><au><snm>D&#8217;Costa</snm><fnm>VM</fnm></au><au><snm>McGrann</snm><fnm>KM</fnm></au><au><snm>Hughes</snm><fnm>DW</fnm></au><au><snm>Wright</snm><fnm>GD</fnm></au></aug><source>Science (New York, N.Y.)</source><pubdate>2006</pubdate><volume>311</volume><fpage>374</fpage><lpage>377</lpage><xrefbib><pubid idtype="doi">10.1126/science.1120800</pubid></xrefbib></bibl><bibl id="B11"><title><p>Antibiotic resistance is ancient</p></title><aug><au><snm>D&#8217;Costa</snm><fnm>VM</fnm></au><au><snm>King</snm><fnm>CE</fnm></au><au><snm>Kalan</snm><fnm>L</fnm></au><au><snm>Morar</snm><fnm>M</fnm></au><au><snm>Sung</snm><fnm>WWL</fnm></au><au><snm>Schwarz</snm><fnm>C</fnm></au><au><snm>Froese</snm><fnm>D</fnm></au><au><snm>Zazula</snm><fnm>G</fnm></au><au><snm>Calmels</snm><fnm>F</fnm></au><au><snm>Debruyne</snm><fnm>R</fnm></au><au><snm>Golding</snm><fnm>GB</fnm></au><au><snm>Poinar</snm><fnm>HN</fnm></au><au><snm>Wright</snm><fnm>GD</fnm></au></aug><source>Nature</source><pubdate>2011</pubdate><volume>477</volume><fpage>457</fpage><lpage>461</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature10388</pubid><pubid idtype="pmpid" link="fulltext">21881561</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>Antibiotics and antibiotic resistance genes in natural environments</p></title><aug><au><snm>Mart&#237;nez</snm><fnm>JL</fnm></au></aug><source>Science (New York, N.Y.)</source><pubdate>2008</pubdate><volume>321</volume><fpage>365</fpage><lpage>367</lpage><xrefbib><pubid idtype="doi">10.1126/science.1159483</pubid></xrefbib></bibl><bibl id="B13"><title><p>Call of the wild: antibiotic resistance genes in natural environments</p></title><aug><au><snm>Allen</snm><fnm>HK</fnm></au><au><snm>Donato</snm><fnm>J</fnm></au><au><snm>Wang</snm><fnm>HH</fnm></au><au><snm>Cloud-Hansen</snm><fnm>KA</fnm></au><au><snm>Davies</snm><fnm>J</fnm></au><au><snm>Handelsman</snm><fnm>J</fnm></au></aug><source>Nat Rev Microbiol</source><pubdate>2010</pubdate><volume>8</volume><fpage>251</fpage><lpage>259</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nrmicro2312</pubid><pubid idtype="pmpid" link="fulltext">20190823</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>Fluoroquinolone resistance in Mycobacterium tuberculosis and mutations in gyrA and gyrB</p></title><aug><au><snm>Von Groll</snm><fnm>A</fnm></au><au><snm>Martin</snm><fnm>A</fnm></au><au><snm>Jureen</snm><fnm>P</fnm></au><au><snm>Hoffner</snm><fnm>S</fnm></au><au><snm>Vandamme</snm><fnm>P</fnm></au><au><snm>Portaels</snm><fnm>F</fnm></au><au><snm>Palomino</snm><fnm>JC</fnm></au><au><snm>da Silva</snm><fnm>PA</fnm></au></aug><source>Antimicrob Agents Chemother</source><pubdate>2009</pubdate><volume>53</volume><fpage>4498</fpage><lpage>4500</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/AAC.00287-09</pubid><pubid idtype="pmcid">2764174</pubid><pubid idtype="pmpid">19687244</pubid></pubidlist></xrefbib></bibl><bibl id="B15"><title><p>DNA Gyrase and Topoisomerase IV Mutations in Quinolone-Resistant Flavobacterium psychrophilum Isolated from Diseased Salmonids in Norway</p></title><aug><au><snm>Shah</snm><fnm>SQA</fnm></au><au><snm>Nilsen</snm><fnm>H</fnm></au><au><snm>Bottolfsen</snm><fnm>K</fnm></au><au><snm>Colquhoun</snm><fnm>DJ</fnm></au><au><snm>S&#248;rum</snm><fnm>H</fnm></au></aug><source>Microb Drug Resist (Larchmont, N.Y.)</source><pubdate>2012</pubdate><fpage>207</fpage><lpage>214</lpage></bibl><bibl id="B16"><title><p>Quinolone resistance from a transferable plasmid</p></title><aug><au><snm>Mart&#237;nez-Mart&#237;nez</snm><fnm>L</fnm></au><au><snm>Pascual</snm><fnm>A</fnm></au><au><snm>Jacoby</snm><fnm>GA</fnm></au></aug><source>Lancet</source><pubdate>1998</pubdate><volume>351</volume><fpage>797</fpage><lpage>799</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S0140-6736(97)07322-4</pubid><pubid idtype="pmpid" link="fulltext">9519952</pubid></pubidlist></xrefbib></bibl><bibl id="B17"><title><p>The worldwide emergence of plasmid-mediated quinolone resistance</p></title><aug><au><snm>Robicsek</snm><fnm>A</fnm></au><au><snm>Jacoby</snm><fnm>GA</fnm></au><au><snm>Hooper</snm><fnm>DC</fnm></au></aug><source>Lancet Infect Dis</source><pubdate>2006</pubdate><volume>6</volume><fpage>629</fpage><lpage>640</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S1473-3099(06)70599-0</pubid><pubid idtype="pmpid" link="fulltext">17008172</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>Plasmid-mediated quinolone resistance: a multifaceted threat</p></title><aug><au><snm>Strahilevitz</snm><fnm>J</fnm></au><au><snm>Jacoby</snm><fnm>GA</fnm></au><au><snm>Hooper</snm><fnm>DC</fnm></au><au><snm>Robicsek</snm><fnm>A</fnm></au></aug><source>Clin Microbiol Rev</source><pubdate>2009</pubdate><volume>22</volume><fpage>664</fpage><lpage>689</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/CMR.00016-09</pubid><pubid idtype="pmcid">2772364</pubid><pubid idtype="pmpid" link="fulltext">19822894</pubid></pubidlist></xrefbib></bibl><bibl id="B19"><title><p>Pentapeptide repeat proteins</p></title><aug><au><snm>Vetting</snm><fnm>MW</fnm></au><au><snm>Hegde</snm><fnm>SS</fnm></au><au><snm>Fajardo</snm><fnm>JE</fnm></au><au><snm>Fiser</snm><fnm>A</fnm></au><au><snm>Roderick</snm><fnm>SL</fnm></au><au><snm>Takiff</snm><fnm>HE</fnm></au><au><snm>Blanchard</snm><fnm>JS</fnm></au></aug><source>Biochemistry</source><pubdate>2006</pubdate><volume>45</volume><fpage>1</fpage><lpage>10</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1021/bi052130w</pubid><pubid idtype="pmcid">2566302</pubid><pubid idtype="pmpid" link="fulltext">16388575</pubid></pubidlist></xrefbib></bibl><bibl id="B20"><title><p>Structure and distribution of pentapeptide repeats in bacteria</p></title><aug><au><snm>Bateman</snm><fnm>A</fnm></au><au><snm>Murzin</snm><fnm>AG</fnm></au><au><snm>Teichmann</snm><fnm>SA</fnm></au></aug><source>Protein Sci</source><pubdate>1998</pubdate><volume>7</volume><fpage>1477</fpage><lpage>1480</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1002/pro.5560070625</pubid><pubid idtype="pmcid">2144021</pubid><pubid idtype="pmpid" link="fulltext">9655353</pubid></pubidlist></xrefbib></bibl><bibl id="B21"><title><p>Structural insights into quinolone antibiotic resistance mediated by pentapeptide repeat proteins: conserved surface loops direct the activity of a Qnr protein from a gram-negative bacterium</p></title><aug><au><snm>Xiong</snm><fnm>X</fnm></au><au><snm>Bromley</snm><fnm>EHC</fnm></au><au><snm>Oelschlaeger</snm><fnm>P</fnm></au><au><snm>Woolfson</snm><fnm>DN</fnm></au><au><snm>Spencer</snm><fnm>J</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2011</pubdate><volume>39</volume><fpage>3917</fpage><lpage>3927</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkq1296</pubid><pubid idtype="pmcid">3089455</pubid><pubid idtype="pmpid" link="fulltext">21227918</pubid></pubidlist></xrefbib></bibl><bibl id="B22"><title><p>Structure of QnrB1, a plasmid-mediated fluoroquinolone resistance factor</p></title><aug><au><snm>Vetting</snm><fnm>MW</fnm></au><au><snm>Hegde</snm><fnm>SS</fnm></au><au><snm>Wang</snm><fnm>M</fnm></au><au><snm>Jacoby</snm><fnm>GA</fnm></au><au><snm>Hooper</snm><fnm>DC</fnm></au><au><snm>Blanchard</snm><fnm>JS</fnm></au></aug><source>J Biol Chem</source><pubdate>2011</pubdate><volume>286</volume><fpage>25265</fpage><lpage>25273</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1074/jbc.M111.226936</pubid><pubid idtype="pmcid">3137097</pubid><pubid idtype="pmpid" link="fulltext">21597116</pubid></pubidlist></xrefbib></bibl><bibl id="B23"><title><p>Metagenomics: application of genomics to uncultured microorganisms</p></title><aug><au><snm>Handelsman</snm><fnm>J</fnm></au></aug><source>Microbiol Mol Biol Rev</source><pubdate>2004</pubdate><volume>68</volume><fpage>669</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/MMBR.68.4.669-685.2004</pubid><pubid idtype="pmcid">539003</pubid><pubid idtype="pmpid" link="fulltext">15590779</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>ShotgunFunctionalizeR: an R-package for functional comparison of metagenomes</p></title><aug><au><snm>Kristiansson</snm><fnm>E</fnm></au><au><snm>Hugenholtz</snm><fnm>P</fnm></au><au><snm>Dalevi</snm><fnm>D</fnm></au></aug><source>Bioinformatics (Oxford, England)</source><pubdate>2009</pubdate><volume>25</volume><fpage>2737</fpage><lpage>2738</lpage><xrefbib><pubid idtype="doi">10.1093/bioinformatics/btp508</pubid></xrefbib></bibl><bibl id="B25"><title><p>Impact of culture-independent studies on the emerging phylogenetic view of bacterial diversity</p></title><aug><au><snm>Hugenholtz</snm><fnm>P</fnm></au><au><snm>Goebel</snm><fnm>BM</fnm></au><au><snm>Pace</snm><fnm>NR</fnm></au></aug><source>J Bacteriol</source><pubdate>1998</pubdate><volume>180</volume><fpage>4765</fpage><lpage>4774</lpage><xrefbib><pubidlist><pubid idtype="pmcid">107498</pubid><pubid idtype="pmpid" link="fulltext">9733676</pubid></pubidlist></xrefbib></bibl><bibl id="B26"><title><p>Metagenomics&#8211;the key to the uncultured microbes</p></title><aug><au><snm>Streit</snm><fnm>WR</fnm></au><au><snm>Schmitz</snm><fnm>RA</fnm></au></aug><source>Curr Opin Microbiol</source><pubdate>2004</pubdate><volume>7</volume><fpage>492</fpage><lpage>498</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.mib.2004.08.002</pubid><pubid idtype="pmpid" link="fulltext">15451504</pubid></pubidlist></xrefbib></bibl><bibl id="B27"><title><p>Metagenomics: read length matters</p></title><aug><au><snm>Wommack</snm><fnm>KE</fnm></au><au><snm>Bhavsar</snm><fnm>J</fnm></au><au><snm>Ravel</snm><fnm>J</fnm></au></aug><source>Appl Environ Microbiol</source><pubdate>2008</pubdate><volume>74</volume><fpage>1453</fpage><lpage>1463</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/AEM.02181-07</pubid><pubid idtype="pmcid">2258652</pubid><pubid idtype="pmpid" link="fulltext">18192407</pubid></pubidlist></xrefbib></bibl><bibl id="B28"><title><p>New qnr Gene Cassettes Associated with Superintegron Repeats in Vibrio cholerae O1</p></title><aug><au><snm>Fonseca</snm><fnm>&#201;L</fnm></au><au><snm>Santos Freitas</snm><fnm>F</fnm></au><au><snm>Vicente</snm><fnm>ACP</fnm></au></aug><source>Emerg Infect Dis</source><pubdate>2008</pubdate><volume>14</volume><fpage>1129</fpage><lpage>1131</lpage><xrefbib><pubidlist><pubid idtype="doi">10.3201/eid1407.080132</pubid><pubid idtype="pmcid">2600354</pubid><pubid idtype="pmpid" link="fulltext">18598639</pubid></pubidlist></xrefbib></bibl><bibl id="B29"><title><p>Predictive analysis of transmissible quinolone resistance indicates Stenotrophomonas maltophilia as a potential source of a novel family of Qnr determinants</p></title><aug><au><snm>S&#225;nchez</snm><fnm>MB</fnm></au><au><snm>Hern&#225;ndez</snm><fnm>A</fnm></au><au><snm>Rodr&#237;guez-Mart&#237;nez</snm><fnm>JM</fnm></au><au><snm>Mart&#237;nez-Mart&#237;nez</snm><fnm>L</fnm></au><au><snm>Mart&#237;nez</snm><fnm>JL</fnm></au></aug><source>BMC Microbiol</source><pubdate>2008</pubdate><volume>8</volume><fpage>148</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2180-8-148</pubid><pubid idtype="pmcid">2556341</pubid><pubid idtype="pmpid" link="fulltext">18793450</pubid></pubidlist></xrefbib></bibl><bibl id="B30"><title><p>Qnr-like pentapeptide repeat proteins in gram-positive bacteria</p></title><aug><au><snm>Rodr&#237;guez-Mart&#237;nez</snm><fnm>JM</fnm></au><au><snm>Velasco</snm><fnm>C</fnm></au><au><snm>Briales</snm><fnm>A</fnm></au><au><snm>Garc&#237;a</snm><fnm>I</fnm></au><au><snm>Conejo</snm><fnm>MC</fnm></au><au><snm>Pascual</snm><fnm>A</fnm></au></aug><source>J Antimicrob Chemother</source><pubdate>2008</pubdate><volume>61</volume><fpage>1240</fpage><lpage>1243</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/jac/dkn115</pubid><pubid idtype="pmpid" link="fulltext">18343805</pubid></pubidlist></xrefbib></bibl><bibl id="B31"><source>qnr Numbering and Sequence</source><note>
   <url>http://www.lahey.org/qnrStudies</url>
</note></bibl><bibl id="B32"><title><p>GenBank</p></title><aug><au><snm>Benson</snm><fnm>DA</fnm></au><au><snm>Boguski</snm><fnm>MS</fnm></au><au><snm>Lipman</snm><fnm>DJ</fnm></au><au><snm>Ostell</snm><fnm>J</fnm></au><au><snm>Ouellette</snm><fnm>BF</fnm></au><au><snm>Rapp</snm><fnm>BA</fnm></au><au><snm>Wheeler</snm><fnm>DL</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>1999</pubdate><volume>27</volume><fpage>12</fpage><lpage>17</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/27.1.12</pubid><pubid idtype="pmcid">148087</pubid><pubid idtype="pmpid" link="fulltext">9847132</pubid></pubidlist></xrefbib></bibl><bibl id="B33"><title><p>CAMERA: a community resource for metagenomics</p></title><aug><au><snm>Seshadri</snm><fnm>R</fnm></au><au><snm>Kravitz</snm><fnm>SA</fnm></au><au><snm>Smarr</snm><fnm>L</fnm></au><au><snm>Gilna</snm><fnm>P</fnm></au><au><snm>Frazier</snm><fnm>M</fnm></au></aug><source>PLoS Biol</source><pubdate>2007</pubdate><volume>5</volume><fpage>e75</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pbio.0050075</pubid><pubid idtype="pmcid">1821059</pubid><pubid idtype="pmpid" link="fulltext">17355175</pubid></pubidlist></xrefbib></bibl><bibl id="B34"><title><p>The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes</p></title><aug><au><snm>Meyer</snm><fnm>F</fnm></au><au><snm>Paarmann</snm><fnm>D</fnm></au><au><snm>D&#8217;Souza</snm><fnm>M</fnm></au><au><snm>Olson</snm><fnm>R</fnm></au><au><snm>Glass</snm><fnm>EM</fnm></au><au><snm>Kubal</snm><fnm>M</fnm></au><au><snm>Paczian</snm><fnm>T</fnm></au><au><snm>Rodriguez</snm><fnm>A</fnm></au><au><snm>Stevens</snm><fnm>R</fnm></au><au><snm>Wilke</snm><fnm>A</fnm></au><au><snm>Wilkening</snm><fnm>J</fnm></au><au><snm>Edwards</snm><fnm>RA</fnm></au></aug><source>BMC Bioinforma</source><pubdate>2008</pubdate><volume>9</volume><fpage>386</fpage><xrefbib><pubid idtype="doi">10.1186/1471-2105-9-386</pubid></xrefbib></bibl><bibl id="B35"><title><p>A human gut microbial gene catalogue established by metagenomic sequencing</p></title><aug><au><snm>Qin</snm><fnm>J</fnm></au><au><snm>Li</snm><fnm>R</fnm></au><au><snm>Raes</snm><fnm>J</fnm></au><au><snm>Arumugam</snm><fnm>M</fnm></au><au><snm>Burgdorf</snm><fnm>KS</fnm></au><au><snm>Manichanh</snm><fnm>C</fnm></au><au><snm>Nielsen</snm><fnm>T</fnm></au><au><snm>Pons</snm><fnm>N</fnm></au><au><snm>Levenez</snm><fnm>F</fnm></au><au><snm>Yamada</snm><fnm>T</fnm></au><au><snm>Mende</snm><fnm>DR</fnm></au><au><snm>Li</snm><fnm>J</fnm></au><au><snm>Xu</snm><fnm>J</fnm></au><au><snm>Li</snm><fnm>SSS</fnm></au><au><snm>Li</snm><fnm>D</fnm></au><au><snm>Cao</snm><fnm>J</fnm></au><au><snm>Wang</snm><fnm>B</fnm></au><au><snm>Liang</snm><fnm>H</fnm></au><au><snm>Zheng</snm><fnm>H</fnm></au><au><snm>Xie</snm><fnm>Y</fnm></au><au><snm>Tap</snm><fnm>J</fnm></au><au><snm>Lepage</snm><fnm>P</fnm></au><au><snm>Bertalan</snm><fnm>M</fnm></au><au><snm>Batto</snm><fnm>J</fnm></au><au><snm>Hansen</snm><fnm>T</fnm></au><au><snm>Le</snm><fnm>D</fnm></au><au><snm>Linneberg</snm><fnm>A</fnm></au><au><snm>Nielsen</snm><fnm>HB</fnm></au><au><snm>Pelletier</snm><fnm>E</fnm></au><au><snm>Renault</snm><fnm>P</fnm></au><au><snm>Sicheritz-Ponten</snm><fnm>T</fnm></au><au><snm>Turner</snm><fnm>K</fnm></au><au><snm>Zhu</snm><fnm>H</fnm></au><au><snm>Yu</snm><fnm>C</fnm></au><au><snm>Jian</snm><fnm>M</fnm></au><au><snm>Zhou</snm><fnm>Y</fnm></au><au><snm>Li</snm><fnm>Y</fnm></au><au><snm>Zhang</snm><fnm>X</fnm></au><au><snm>Guarner</snm><fnm>F</fnm></au><au><snm>Qin</snm><fnm>N</fnm></au><au><snm>Yang</snm><fnm>H</fnm></au><au><snm>Wang</snm><fnm>JJ</fnm></au><au><snm>Brunak</snm><fnm>S</fnm></au><au><snm>Dore</snm><fnm>J</fnm></au><au><snm>Le Paslier</snm><fnm>D</fnm></au><au><snm>Dor&#233;</snm><fnm>J</fnm></au><au><snm>Kristiansen</snm><fnm>K</fnm></au><au><snm>Pedersen</snm><fnm>O</fnm></au><au><snm>Parkhill</snm><fnm>J</fnm></au><au><snm>Weissenbach</snm><fnm>J</fnm></au><au><snm>Bork</snm><fnm>P</fnm></au><au><snm>Ehrlich</snm><fnm>SD</fnm></au></aug><source>Nature</source><pubdate>2010</pubdate><volume>464</volume><fpage>59</fpage><lpage>65</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature08821</pubid><pubid idtype="pmpid" link="fulltext">20203603</pubid></pubidlist></xrefbib></bibl><bibl id="B36"><title><p>The sequence read archive</p></title><aug><au><snm>Leinonen</snm><fnm>R</fnm></au><au><snm>Sugawara</snm><fnm>H</fnm></au><au><snm>Shumway</snm><fnm>M</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2011</pubdate><volume>39</volume><fpage>D19</fpage><lpage>D21</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkq1019</pubid><pubid idtype="pmcid">3013647</pubid><pubid idtype="pmpid" link="fulltext">21062823</pubid></pubidlist></xrefbib></bibl><bibl id="B37"><title><p>Pyrosequencing of antibiotic-contaminated river sediments reveals high levels of resistance and gene transfer elements</p></title><aug><au><snm>Kristiansson</snm><fnm>E</fnm></au><au><snm>Fick</snm><fnm>J</fnm></au><au><snm>Janzon</snm><fnm>A</fnm></au><au><snm>Grabic</snm><fnm>R</fnm></au><au><snm>Rutgersson</snm><fnm>C</fnm></au><au><snm>Weijdeg&#229;rd</snm><fnm>B</fnm></au><au><snm>S&#246;derstr&#246;m</snm><fnm>H</fnm></au><au><snm>Larsson</snm><fnm>DGJ</fnm></au></aug><source>PLoS One</source><pubdate>2011</pubdate><volume>6</volume><fpage>e17038</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pone.0017038</pubid><pubid idtype="pmcid">3040208</pubid><pubid idtype="pmpid" link="fulltext">21359229</pubid></pubidlist></xrefbib></bibl><bibl id="B38"><title><p>Evolution and recombination of the plasmidic qnr alleles</p></title><aug><au><snm>Baquirin</snm><fnm>MHC</fnm></au><au><snm>Barlow</snm><fnm>M</fnm></au></aug><source>J Mol Evol</source><pubdate>2008</pubdate><volume>67</volume><fpage>103</fpage><lpage>110</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1007/s00239-008-9131-3</pubid><pubid idtype="pmpid" link="fulltext">18592295</pubid></pubidlist></xrefbib></bibl><bibl id="B39"><title><p>Citrobacter spp. as a source of qnrB Alleles</p></title><aug><au><snm>Jacoby</snm><fnm>GA</fnm></au><au><snm>Griffin</snm><fnm>CM</fnm></au><au><snm>Hooper</snm><fnm>DC</fnm></au></aug><source>Antimicrob Agents Chemother</source><pubdate>2011</pubdate><volume>55</volume><fpage>4979</fpage><lpage>4984</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/AAC.05187-11</pubid><pubid idtype="pmcid">3195048</pubid><pubid idtype="pmpid">21844311</pubid></pubidlist></xrefbib></bibl><bibl id="B40"><title><p>New plasmid-mediated quinolone resistance gene, qnrC, found in a clinical isolate of Proteus mirabilis</p></title><aug><au><snm>Wang</snm><fnm>MM</fnm></au><au><snm>Guo</snm><fnm>Q</fnm></au><au><snm>Xu</snm><fnm>X</fnm></au><au><snm>Wang</snm><fnm>X</fnm></au><au><snm>Ye</snm><fnm>X</fnm></au><au><snm>Wu</snm><fnm>S</fnm></au><au><snm>Hooper</snm><fnm>DC</fnm></au></aug><source>Antimicrob Agents Chemother</source><pubdate>2009</pubdate><volume>53</volume><fpage>1892</fpage><lpage>1897</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/AAC.01400-08</pubid><pubid idtype="pmcid">2681562</pubid><pubid idtype="pmpid">19258263</pubid></pubidlist></xrefbib></bibl><bibl id="B41"><title><p>Smaqnr, a new chromosome-encoded quinolone resistance determinant in Serratia marcescens</p></title><aug><au><snm>Velasco</snm><fnm>C</fnm></au><au><snm>Rodr&#237;guez-Mart&#237;nez</snm><fnm>JM</fnm></au><au><snm>Briales</snm><fnm>A</fnm></au><au><snm>de D&#237;az Alba</snm><fnm>P</fnm></au><au><snm>Calvo</snm><fnm>A</fnm></au><au><snm>Pascual</snm><fnm>A</fnm></au></aug><source>J Antimicrob Chemother</source><pubdate>2010</pubdate><volume>65</volume><fpage>239</fpage><lpage>242</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/jac/dkp424</pubid><pubid idtype="pmpid" link="fulltext">19942618</pubid></pubidlist></xrefbib></bibl><bibl id="B42"><title><p>The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes</p></title><aug><au><snm>Overbeek</snm><fnm>R</fnm></au><au><snm>Begley</snm><fnm>T</fnm></au><au><snm>Butler</snm><fnm>RM</fnm></au><au><snm>Choudhuri</snm><fnm>JV</fnm></au><au><snm>Chuang</snm><fnm>H-Y</fnm></au><au><snm>Cohoon</snm><fnm>M</fnm></au><au><snm>de Cr&#233;cy-Lagard</snm><fnm>V</fnm></au><au><snm>Diaz</snm><fnm>N</fnm></au><au><snm>Disz</snm><fnm>T</fnm></au><au><snm>Edwards</snm><fnm>R</fnm></au><au><snm>Fonstein</snm><fnm>M</fnm></au><au><snm>Frank</snm><fnm>ED</fnm></au><au><snm>Gerdes</snm><fnm>S</fnm></au><au><snm>Glass</snm><fnm>EM</fnm></au><au><snm>Goesmann</snm><fnm>A</fnm></au><au><snm>Hanson</snm><fnm>A</fnm></au><au><snm>Iwata-Reuyl</snm><fnm>D</fnm></au><au><snm>Jensen</snm><fnm>R</fnm></au><au><snm>Jamshidi</snm><fnm>N</fnm></au><au><snm>Krause</snm><fnm>L</fnm></au><au><snm>Kubal</snm><fnm>M</fnm></au><au><snm>Larsen</snm><fnm>N</fnm></au><au><snm>Linke</snm><fnm>B</fnm></au><au><snm>McHardy</snm><fnm>AC</fnm></au><au><snm>Meyer</snm><fnm>F</fnm></au><au><snm>Neuweger</snm><fnm>H</fnm></au><au><snm>Olsen</snm><fnm>G</fnm></au><au><snm>Olson</snm><fnm>R</fnm></au><au><snm>Osterman</snm><fnm>A</fnm></au><au><snm>Portnoy</snm><fnm>V</fnm></au><etal/></aug><source>Nucleic Acids Res</source><pubdate>2005</pubdate><volume>33</volume><fpage>5691</fpage><lpage>5702</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gki866</pubid><pubid idtype="pmcid">1251668</pubid><pubid idtype="pmpid" link="fulltext">16214803</pubid></pubidlist></xrefbib></bibl><bibl id="B43"><title><p>Strain-resolved community genomic analysis of gut microbial colonization in a premature infant</p></title><aug><au><snm>Morowitz</snm><fnm>MJ</fnm></au><au><snm>Denef</snm><fnm>VJ</fnm></au><au><snm>Costello</snm><fnm>EK</fnm></au><au><snm>Thomas</snm><fnm>BC</fnm></au><au><snm>Poroyko</snm><fnm>V</fnm></au></aug><source>Proc Natl Acad Sci</source><pubdate>2010</pubdate><volume>108</volume><fpage>1128</fpage><lpage>1133</lpage><xrefbib><pubidlist><pubid idtype="pmcid">3024690</pubid><pubid idtype="pmpid" link="fulltext">21191099</pubid></pubidlist></xrefbib></bibl><bibl id="B44"><title><p>The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families</p></title><aug><au><snm>Yooseph</snm><fnm>S</fnm></au><au><snm>Sutton</snm><fnm>G</fnm></au><au><snm>Rusch</snm><fnm>DB</fnm></au><au><snm>Halpern</snm><fnm>AL</fnm></au><au><snm>Williamson</snm><fnm>SJ</fnm></au><au><snm>Remington</snm><fnm>K</fnm></au><au><snm>Eisen</snm><fnm>JA</fnm></au><au><snm>Heidelberg</snm><fnm>KB</fnm></au><au><snm>Manning</snm><fnm>G</fnm></au><au><snm>Li</snm><fnm>W</fnm></au><au><snm>Jaroszewski</snm><fnm>L</fnm></au><au><snm>Cieplak</snm><fnm>P</fnm></au><au><snm>Miller</snm><fnm>CS</fnm></au><au><snm>Li</snm><fnm>H</fnm></au><au><snm>Mashiyama</snm><fnm>ST</fnm></au><au><snm>Joachimiak</snm><fnm>MP</fnm></au><au><snm>van Belle</snm><fnm>C</fnm></au><au><snm>Chandonia</snm><fnm>J-M</fnm></au><au><snm>Soergel</snm><fnm>DA</fnm></au><au><snm>Zhai</snm><fnm>Y</fnm></au><au><snm>Natarajan</snm><fnm>K</fnm></au><au><snm>Lee</snm><fnm>S</fnm></au><au><snm>Raphael</snm><fnm>BJ</fnm></au><au><snm>Bafna</snm><fnm>V</fnm></au><au><snm>Friedman</snm><fnm>R</fnm></au><au><snm>Brenner</snm><fnm>SE</fnm></au><au><snm>Godzik</snm><fnm>A</fnm></au><au><snm>Eisenberg</snm><fnm>D</fnm></au><au><snm>Dixon</snm><fnm>JE</fnm></au><au><snm>Taylor</snm><fnm>SS</fnm></au><au><snm>Strausberg</snm><fnm>RL</fnm></au><au><snm>Frazier</snm><fnm>M</fnm></au><au><snm>Venter</snm><fnm>JC</fnm></au></aug><source>PLoS Biol</source><pubdate>2007</pubdate><volume>5</volume><fpage>e16</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pbio.0050016</pubid><pubid idtype="pmcid">1821046</pubid><pubid idtype="pmpid" link="fulltext">17355171</pubid></pubidlist></xrefbib></bibl><bibl id="B45"><title><p>Estimating DNA coverage and abundance in metagenomes using a gamma approximation</p></title><aug><au><snm>Hooper</snm><fnm>SD</fnm></au><au><snm>Dalevi</snm><fnm>D</fnm></au><au><snm>Pati</snm><fnm>A</fnm></au><au><snm>Mavromatis</snm><fnm>K</fnm></au><au><snm>Ivanova</snm><fnm>NN</fnm></au><au><snm>Kyrpides</snm><fnm>NC</fnm></au></aug><source>Bioinformatics (Oxford, England)</source><pubdate>2010</pubdate><volume>26</volume><fpage>295</fpage><lpage>301</lpage><xrefbib><pubid idtype="doi">10.1093/bioinformatics/btp687</pubid></xrefbib></bibl><bibl id="B46"><title><p>Microbial Metagenomics: Beyond the Genome</p></title><aug><au><snm>Gilbert</snm><fnm>JA</fnm></au><au><snm>Dupont</snm><fnm>CL</fnm></au></aug><source>Ann Rev Mar Sci</source><pubdate>2011</pubdate><volume>3</volume><fpage>347</fpage><lpage>371</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1146/annurev-marine-120709-142811</pubid><pubid idtype="pmpid">21329209</pubid></pubidlist></xrefbib></bibl><bibl id="B47"><aug><au><snm>Gilbert</snm><fnm>JA</fnm></au><au><snm>Meyer</snm><fnm>F</fnm></au><au><snm>Jansson</snm><fnm>J</fnm></au><au><snm>Gordon</snm><fnm>J</fnm></au><au><snm>Pace</snm><fnm>N</fnm></au><au><snm>Ley</snm><fnm>R</fnm></au><au><snm>Fierer</snm><fnm>N</fnm></au><au><snm>Field</snm><fnm>D</fnm></au><au><snm>Kyrpides</snm><fnm>N</fnm></au><au><snm>Gl&#246;ckner</snm><fnm>F</fnm></au></aug><source>The Earth Microbiome Project: Meeting report of the &#8220; 1st EMP meeting on sample selection and acquisition &#8221; at Argonne National Laboratory October 6th 2010</source><pubdate>2010</pubdate><fpage>249</fpage><lpage>253</lpage><xrefbib><pubidlist><pubid idtype="pmcid">3035312</pubid><pubid idtype="pmpid">21304728</pubid></pubidlist></xrefbib></bibl><bibl id="B48"><title><p>The Earth Microbiome Project: The Meeting Report for the 1<sup>st</sup> International Earth Microbiome Project Conference, Shenzhen, China, June 13<sup>th</sup>-15<sup>th</sup> 2011</p></title><aug><au><snm>Gilbert</snm><fnm>JA</fnm></au><au><snm>Bailey</snm><fnm>M</fnm></au><au><snm>Field</snm><fnm>D</fnm></au><au><snm>Fierer</snm><fnm>N</fnm></au><au><snm>Fuhrman</snm><fnm>JA</fnm></au><au><snm>Hu</snm><fnm>B</fnm></au><au><snm>Jansson</snm><fnm>J</fnm></au><au><snm>Knight</snm><fnm>R</fnm></au><au><snm>Kowalchuk</snm><fnm>GA</fnm></au><au><snm>Kyrpides</snm><fnm>NC</fnm></au><au><snm>Meyer</snm><fnm>F</fnm></au><au><snm>Stevens</snm><fnm>R</fnm></au></aug><source>Stand Genomic Sci</source><pubdate>2011</pubdate><volume>5</volume><fpage>243</fpage><lpage>247</lpage><xrefbib><pubid idtype="doi">10.4056/sigs.2134923</pubid></xrefbib></bibl><bibl id="B49"><title><p>Accelerated Profile HMM Searches</p></title><aug><au><snm>Eddy</snm><fnm>SR</fnm></au></aug><source>PLoS Comput Biol</source><pubdate>2011</pubdate><volume>7</volume><fpage>e1002195</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pcbi.1002195</pubid><pubid idtype="pmcid">3197634</pubid><pubid idtype="pmpid" link="fulltext">22039361</pubid></pubidlist></xrefbib></bibl><bibl id="B50"><title><p>Sequencing technologies - the next generation</p></title><aug><au><snm>Metzker</snm><fnm>ML</fnm></au></aug><source>Nat Rev Genet</source><pubdate>2010</pubdate><volume>11</volume><fpage>31</fpage><lpage>46</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nrg2626</pubid><pubid idtype="pmpid" link="fulltext">19997069</pubid></pubidlist></xrefbib></bibl><bibl id="B51"><title><p>MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform</p></title><aug><au><snm>Katoh</snm><fnm>K</fnm></au><au><snm>Misawa</snm><fnm>K</fnm></au><au><snm>Kuma</snm><fnm>K</fnm></au><au><snm>Miyata</snm><fnm>T</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2002</pubdate><volume>30</volume><fpage>3059</fpage><lpage>3066</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkf436</pubid><pubid idtype="pmcid">135756</pubid><pubid idtype="pmpid" link="fulltext">12136088</pubid></pubidlist></xrefbib></bibl><bibl id="B52"><title><p>The European Molecular Biology Open Software Suite EMBOSS: The European Molecular Biology Open Software Suite</p></title><aug><au><snm>Rice</snm><fnm>P</fnm></au><au><snm>Longden</snm><fnm>I</fnm></au><au><snm>Bleasby</snm><fnm>A</fnm></au></aug><source>Trends Genet</source><pubdate>2000</pubdate><volume>16</volume><fpage>276</fpage><lpage>277</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S0168-9525(00)02024-2</pubid><pubid idtype="pmpid" link="fulltext">10827456</pubid></pubidlist></xrefbib></bibl><bibl id="B53"><title><p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</p></title><aug><au><snm>Altschul</snm><fnm>SF</fnm></au><au><snm>Madden</snm><fnm>TL</fnm></au><au><snm>Sch&#228;ffer</snm><fnm>AA</fnm></au><au><snm>Zhang</snm><fnm>J</fnm></au><au><snm>Zhang</snm><fnm>Z</fnm></au><au><snm>Miller</snm><fnm>W</fnm></au><au><snm>Lipman</snm><fnm>DJ</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>1997</pubdate><volume>25</volume><fpage>3389</fpage><lpage>3402</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/25.17.3389</pubid><pubid idtype="pmcid">146917</pubid><pubid idtype="pmpid" link="fulltext">9254694</pubid></pubidlist></xrefbib></bibl></refgrp>
	</bm>
</art>