<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art><ui>1471-2164-12-403</ui><ji>1471-2164</ji><fm>
<dochead>Research article</dochead>
<bibl>
<title>
<p>The mosaicism of plasmids revealed by atypical genes detection and analysis</p>
</title>
<aug>
<au id="A1"><snm>Bosi</snm><fnm>Emanuele</fnm><insr iid="I1"/><email>bozzo_87@hotmail.it</email></au>
<au id="A2"><snm>Fani</snm><fnm>Renato</fnm><insr iid="I1"/><email>renato.fani@unifi.it</email></au>
<au ca="yes" id="A3"><snm>Fondi</snm><fnm>Marco</fnm><insr iid="I1"/><email>marco.fondi@unifi.it</email></au>
</aug>
<insg>
<ins id="I1"><p>Lab. of Microbial and Molecular Evolution, Dept. of Evolutionary Biology, Via Romana 17-19, University of Florence, Italy</p></ins>
</insg>
<source>BMC Genomics</source>
<issn>1471-2164</issn>
<pubdate>2011</pubdate>
<volume>12</volume>
<issue>1</issue>
<fpage>403</fpage>
<url>http://www.biomedcentral.com/1471-2164/12/403</url>
<xrefbib><pubidlist><pubid idtype="pmpid">21824433</pubid><pubid idtype="doi">10.1186/1471-2164-12-403</pubid></pubidlist></xrefbib>
</bibl>
<history><rec><date><day>17</day><month>3</month><year>2011</year></date></rec><acc><date><day>8</day><month>8</month><year>2011</year></date></acc><pub><date><day>8</day><month>8</month><year>2011</year></date></pub></history>
<cpyrt><year>2011</year><collab>Bosi et al; licensee BioMed Central Ltd.</collab><note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
<abs>
<sec>
<st>
<p>Abstract</p>
</st>
<sec>
<st>
<p>Background</p>
</st>
<p>From an evolutionary viewpoint, prokaryotic genomes are extremely plastic and dynamic, since large amounts of genetic material are continuously added and/or lost through promiscuous gene exchange. In this picture, plasmids play a key role, since they can be transferred between different cells and, through genetic rearrangement(s), undergo gene(s) load, leading, in turn, to the appearance of important metabolic innovations that might be relevant for cell life. Despite their central position in bacterial evolution, a massive analysis of newly acquired functional blocks [likely the result of horizontal gene transfer (HGT) events] residing on plasmids is still missing.</p>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<p>We have developed a computational, composition-based, pipeline to scan almost 2000 plasmids for genes that differ significantly from their hosting molecule. Plasmids atypical genes (PAGs) were about 6% of the total plasmids ORFs and, on average, each plasmid possessed 4.4 atypical genes. Nevertheless, conjugative plasmids were shown to possess an amount of atypical genes than that found in not mobilizable plasmids, providing strong support for the central role suggested for conjugative plasmids in the context of HGT. Part of the retrieved PAGs are organized into (mainly short) clusters and are involved in important biological processes (detoxification, antibiotic resistance, virulence), revealing the importance of HGT in the spreading of metabolic pathways within the whole microbial community. Lastly, our analysis revealed that PAGs mainly derive from other plasmid (rather than coming from phages and/or chromosomes), suggesting that plasmid-plasmid DNA exchange might be the primary source of metabolic innovations in this class of mobile genetic elements.</p>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>In this work we have performed the first large scale analysis of atypical genes that reside on plasmid molecules to date. Our findings on PAGs function, organization, distribution and spreading reveal the importance of plasmids-mediated HGT within the complex bacterial evolutionary network and in the dissemination of important biological traits.</p>
</sec>
</sec>
</abs>
</fm><bdy>
<sec>
<st>
<p>Background</p>
</st>
<p>Comparative whole-genome analyses have demonstrated that horizontal gene transfer (HGT) provides a significant contribution to prokaryotic genome evolution/innovation. In fact, it is very likely that a significant proportion of the genetic diversity exhibited by extant bacteria might be the result of the acquisition of sequences from more or less distantly related organisms <abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp>. Indeed, HGT gives a venue for bacterial diversification by the reassortment of existing capabilities <abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp> and this formidable sexual promiscuity has given bacteria a great advantage, providing an awesome mechanism for ongoing adaptive evolution, a sort of permanently and rapidly evolving communal genome <abbrgrp>
<abbr bid="B2">2</abbr>
</abbrgrp>.</p>
<p>During evolution, HGT and recombination have shaped bacterial genomes, which today appear as complex mosaics of genes from different lineages, species, and genera <abbrgrp>
<abbr bid="B2">2</abbr>
</abbrgrp>.</p>
<p>In this picture, plasmids (collections of functional genetic modules that are organized into a stable, self-replicating entity or 'replicon'), might have played (and might still play) a major role because they can be transferred between microorganisms, thus representing natural vectors for the transfer of genes and the functions they code for <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp>. Moreover, it can be suggested that, during their evolutionary history, plasmids can undergo genetic rearrangements with either plasmids and/or cromosome(s) residing the same cytoplasm and/or with phages infecting the same cell. As a consequence, newly acquired genes can be integrated on plasmids and (eventually) be maintained.</p>
<p>In general, three major processes can mediate HGT among bacteria: transformation (the uptake of free DNA), transduction (DNA transfer mediated by bacteriophages) and conjugation (DNA transfer by means of plasmids or integrative conjugative elements) <abbrgrp>
<abbr bid="B4">4</abbr>
</abbrgrp>. However, regardless of the transfer mechanism, once that DNA has entered the recipient cell it can undergo homologous recombination or homology-facilitated illegitimate recombination and can be successfully integrated into the genome of the new host. Lastly, if the newly acquired DNA confers a selective advantage to the host, it can be maintained and, possibly, spread again through the bacterial population. Importantly, it can be surmised that, at least in the first stages following the integration event (before the amelioration process can start), exogenous sequences maintain their own peculiar compositional features [e.g. GC% and dinucleotide relative abundance difference (&#948;*)] that usually differ from the rest of the "new" hosting molecule; for this reason these sequences are often defined "atypical".</p>
<p>Atypical (and, possibly, horizontally transferred) genes detection can be pursued by composition-based methods <abbrgrp>
<abbr bid="B5">5</abbr>
<abbr bid="B6">6</abbr>
</abbrgrp> that involve alignment-free features, such as GC% content and/or &#948;* <abbrgrp>
<abbr bid="B6">6</abbr>
</abbrgrp>. Compositionally oriented methods rely on the observation that some genome features (including GC% content and &#948;*) are typical for a given bacterial genome and similar between closely related genomes. Accordingly, recently acquired genes are likely to display anomalous composition, especially when they originated in distantly related species; moreover a different composition will also be observed in those cases in which amelioration process has been retarded. Interestingly, it has been proposed that the genome signature (a compositional parameter reflecting the dinucleotide relative abundance values between two different DNA strands) of plasmids does not resemble that of their host genome, probably indicating either absence of amelioration or a less stable relationship between plasmids and their host <abbrgrp>
<abbr bid="B7">7</abbr>
</abbrgrp>.</p>
<p>Based on composition-oriented strategies, recent analyses on large sets of bacterial and archaeal chromosomes have revealed their mosaic structure, since considerable proportions of most of them consist of horizontally acquired genes <abbrgrp>
<abbr bid="B8">8</abbr>
<abbr bid="B9">9</abbr>
<abbr bid="B10">10</abbr>
</abbrgrp>. For example, applying a Bayesian method on 116 prokaryotic complete genomes, Nakamura et al (2004) found that the average proportion of horizontally transferred genes <it>per </it>genome was about 12% of all ORFs, ranging from 0.5% to 25%. Similarly Cortez et al. (2009), analysing a set of 119 bacterial and archaeal chromosomes (351111 ORFs), found that a large fraction of them was populated by atypical genes (defined as clusters of atypical genes, CAGs) (58487, 16% of all genes). Hence, this strongly indicates that archaeal and bacterial chromosomes contain an impressive proportion of recently acquired foreign genes (including ORFans, that is open reading frames without matches in current sequence databases) coming from a still largely unexplored reservoirs <abbrgrp>
<abbr bid="B10">10</abbr>
</abbrgrp>. Finally, the same authors found that among the identified CAGs, a large number were likely of plasmid origin <abbrgrp>
<abbr bid="B10">10</abbr>
</abbrgrp>. These lines of evidence suggest that genetic mobility should not be merely interpreted in terms of transportation of genes bypassing the cell barriers of prokaryotes, but rather as a perpetual flow between discrete reproductive units <abbrgrp>
<abbr bid="B11">11</abbr>
</abbrgrp>, i.e chromosomes and/or MGE, including plasmids. In fact, an emerging view suggests that plasmids (and MGEs in general) should be considered as mosaics of functional blocks (modules) of genes <abbrgrp>
<abbr bid="B12">12</abbr>
<abbr bid="B13">13</abbr>
</abbrgrp>. Remarkably, in a few cases, the mosaic structure of plasmids (according to compositional criteria) has been demonstrated <abbrgrp>
<abbr bid="B9">9</abbr>
<abbr bid="B14">14</abbr>
<abbr bid="B15">15</abbr>
</abbrgrp> revealing interesting insights on the dissemination of key biological traits such as antibiotic resistance, virulence and heavy metal detoxification. Accordingly, it is reasonable that the identification and the analysis of plasmid atypical genes (hereinafter PAGs) might reveal interesting insights in the (probably complex) network of intra- and inter-cellular gene transfer(s) that plasmids can face during their evolution. Indeed, PAGs might be either the outcome of (one or more) HGT(s) or of internal recombination events with chromosome(s) residing the same cytoplasm but possessing different compositional signatures.</p>
<p>However, to the best of our knowledge, a massive analysis of alien modules that may reside on plasmid molecules has not been undertaken up to now. Therefore, the aim of this work was to develop a statistically validated computational strategy that integrates two distinct compositional measures [GC% content and &#948;*] to scan nearly 2000 archaeal and bacterial plasmids for the presence of PAGs. Finally retrieved PAGs datasets have been analyzed, revealing interesting trends in the overall plasmid gene exchange network.</p>
</sec>
<sec>
<st>
<p>Methods</p>
</st>
<sec>
<st>
<p>Analyzed genomes</p>
</st>
<p>All the available complete plasmids, phages and chromosome sequences were downloaded from NCBI ftp site at (<url>http://www.ncbi.nlm.nih.gov/Ftp/</url>, as on February the 1<sup>st </sup>2010). Concerning plasmids, we focused our attention only on those longer than 3 kb and harboring at least 2 ORFs, in order to be able to detect &#948;* and differences in GC% content among all the genes. This allowed to assemble a dataset of 1853 plasmids for a total of 128.569 ORFs. The complete list of plasmids analyzed in this work (together with other information such as their size, their accession codes etc.) is available as Additional file <supplr sid="S1">1</supplr>.</p>
<suppl id="S1">
<title>
<p>Additional file 1</p>
</title>
<text>
<p>
<b>analyzed plasmds dataset</b>. Table with the information concerning the 1853 analyzed plasmids, including their name, their accession number and PAGs content at different CI thresholds.</p>
</text>
<file name="1471-2164-12-403-S1.XLSX">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
<sec>
<st>
<p>Atypical genes detection</p>
</st>
<p>Classically, atypical genes detection has been pursued either by i) phylogenetic methods based on sequence alignment) and/or ii) by composition-based methods that involve alignment-free features such as GC% content, synonymous codon usage or the frequencies of overlapping short oligomers <abbrgrp>
<abbr bid="B16">16</abbr>
</abbrgrp>. Several <it>in silico </it>based methods have been conceived in the past few years to identify foreign genes that were recently acquired by chromosomes <abbrgrp>
<abbr bid="B1">1</abbr>
<abbr bid="B17">17</abbr>
<abbr bid="B18">18</abbr>
<abbr bid="B19">19</abbr>
<abbr bid="B20">20</abbr>
<abbr bid="B21">21</abbr>
<abbr bid="B22">22</abbr>
<abbr bid="B23">23</abbr>
<abbr bid="B24">24</abbr>
<abbr bid="B25">25</abbr>
</abbrgrp>. However, it is not possible to discriminate between plasmids atypical or native genes using phylogeny-oriented methods since a set of homologous and universally shared sequences (necessary to build a reference phylogeny) are often unavailable. Furthermore, the absence of a universally shared "core" of genes (i.e. likely not subject to extensive HGT) when analysing plasmids sequences does not allow the use of classical statistics methods (such as Markov model-based approaches), which require the presence of a set of native genes in order to identify those ORFs that have a signifcant different composition. Hence, in this work, we have developed a computational strategy (Figure <figr fid="F1">1</figr>) that combines two compositional measures (GC content difference and &#948;*) in order to identify putative PAGs within a dataset of 1853 plasmids. Briefly, GC content is calculated as (G+C)/(A+T+G+C), where G, C, A and T is the number of guanines, cytosine, adenines and thymines, respectively. Conversely, &#948;* between two sequences a and b (from different organisms or from different regions of the same genome) is calculated as &#948;<sup>&#8727;</sup>(a,b) = (1/16)&#8721;<sub>XY</sub>|&#961;*<sub>XY</sub>(a)&#8722;&#961;*<sub>XY</sub>(b)| where the sum extends over all possible XY dinucleotides and where &#961;<sub>XY </sub>= f<sub>XY </sub>/f<sub>X</sub>f<sub>Y</sub>, with f<sub>XY </sub>representing the frequency of the dinucleotide XY in the genome.</p>
<fig id="F1"><title><p>Figure 1</p></title><caption><p>Applied strategy</p></caption><text>
   <p><b>Applied strategy</b>. The overall strategy adopted in this work for the identification of plasmids atypical genes (PAGs).</p>
</text><graphic file="1471-2164-12-403-1" hint_layout="single"/></fig>
<p>Hence, for each of the 128.569 ORFs of the 1853 plasmids (Figure <figr fid="F1">1a</figr>), we estimated both the &#948;* and the GC content difference (&#916;GC) in respect to the corresponding source plasmid (Figure <figr fid="F1">1b</figr> and <figr fid="F1">1c</figr>). Since the distributions of these values did not follow a normal distribution (according to Kolmogorov-Smirnov test with a p-value threshold of 0.05), we used a distribution-independent procedure to evaluate the probability of each point of these distributions. performing a bootstrap sampling (Figure <figr fid="F1">1d</figr> and <figr fid="F1">1e</figr>) of all the obtained values. In other words, a probability was assigned to each of the values [e.g. P(A)] of these two distributions computing it as P(A) = n(A)/N, where n(A) is the number of times in which the observed value (of &#916;GC and &#948;*, respectively) was greater than the other (128.569) values after N samplings. By doing so, two distinct p-values (Figure <figr fid="F1">1f</figr> and <figr fid="F1">1g</figr>) were associated to each sequence of the dataset: the first accounting for the probability of a gene to be atypical in terms of &#948;<sup>&#8727; </sup>and the other accounting for the probability of a gene to be atypical in terms of &#916;GC. Further on, these two distinct p-values were integrated in a single one according to the Fisher method <abbrgrp>
<abbr bid="B26">26</abbr>
</abbrgrp>. In its basic form, the Fisher method is used to combine the results from two (or more) tests bearing upon the same overall null hypothesis. In other words, Fisher's method combines p-values into one test statistic (X<sup>2</sup>) using the formula: X<sup>2 </sup>= -2&#8721;log<sub>e</sub>(p<sub>i</sub>), where p<it>i </it>is the p-value for the <it>i</it>
<sup>th </sup>hypothesis test. Accordingly, (Figure <figr fid="F1">1h</figr>) only those sequences identified at a confidence interval (CI) greater than 95% were considered PAGs. Moreover, in order to explore different CI thresholds, we also collected gene sets that were identified as atypical with lower confidence values (i.e. 70%, 80% and 90%). The whole pipeline has been implemented in Perl codes and is available upon request. As it might be expected, lower CI thresholds allowed the assembly of larger PAGs dataset, ranging from 14731 (with a CI of 90%) to almost 40000 (with a CI of 70%) (<it>see </it>Additional file <supplr sid="S1">1</supplr>).</p>
</sec>
<sec>
<st>
<p>Identification of PAGs source molecules</p>
</st>
<p>In order to identify the most likely source molecule of identified PAGS we developed a similarity-oriented computational pipeline according to which each of the identified PAG was used as a query for a BLAST <abbrgrp>
<abbr bid="B27">27</abbr>
</abbrgrp> search against three different databases, each of which embedding 30000 sequences retrieved from NCBI plasmid, phage and chromosome databases (see Methods), respectively. For each of the BLAST searches, only the best BLAST hit was considered, in order to reduce any possible bias due to the presence of closely related sequences in the database that would falsely increase the number of homologs for a given ORF. This strategy was repeated 1000 times for each PAG and, for each of the 1000 runs, new plasmid, chromosome and phage databases were assembled, randomly sampling 30000 sequences from the NCBI databases. Finally, the putative source molecule was identified according to the database (plasmid, phage or chromosome) that produced the highest number of best hits after 1000 BLAST probings.</p>
</sec>
<sec>
<st>
<p>Statistics</p>
</st>
<p>All statistical tests were performed with the R package <abbrgrp>
<abbr bid="B28">28</abbr>
</abbrgrp>. All other statistical analyses were performed using <it>in-house </it>developed Perl scripts.</p>
</sec>
</sec>
<sec>
<st>
<p>Results and Discussion</p>
</st>
<sec>
<st>
<p>PAGs general feaures</p>
</st>
<sec>
<st>
<p>PAGs distribution</p>
</st>
<p>We applied the computational pipeline described in Methods (Figure <figr fid="F1">1</figr>) to 1853 archaeal (51) and bacterial (1802) plasmids (128.569 ORFs in total) in order to explore their mosaic structure, using a confidence value of 95% (see Methods for details). In this way we were able to collect 8065 compositionally atypical ORFs, denominated PAGs (that is 6.2% of all the 128.569 encoded proteins, Table <tblr tid="T1">1</tblr>) distributed through 1354 plasmid molecules (73% of the entire dataset). The remaining plasmids (519, 27% of the dataset) do not possess any atypical gene at all. The fasta format of PAGs sequences (including those retrieved at a CI of 70%, 80% and 90%) is available as Additional file <supplr sid="S2">2</supplr>. The analysis of the distribution of PAGs (that is the number of PAGs <it>per </it>plasmid) within the assembled plasmid dataset revealed that they are not evenly distributed (Figure <figr fid="F2">2</figr>), ranging from 59 (found in <it>Methylobacterium extorquens </it>AM1) to 1 PAG (342 plasmids). Overall the distribution of PAGs across the microbial dataset showed that a high number of plasmids possess a small number of PAGs (or do not possess any PAGs at all) whereas only a few of them possess higher number of atypical ORFs (Figure <figr fid="F2">2</figr>). Importantly, the same trend was observed with PAGs dataset retrieved at lower CI, i.e. 70%, 80% and 90% (Additional file <supplr sid="S3">3</supplr>).</p>
<tbl id="T1"><title><p>Table 1</p></title><caption><p>PAGs general features</p></caption><tblbdy cols="2">
      <r>
         <c ca="left">
            <p>N. of analyzed plasmids</p>
         </c>
         <c ca="left">
            <p>1853</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>N. of analyzed sequences</p>
         </c>
         <c ca="left">
            <p>128.569</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>N. of retrieved PAGs</p>
         </c>
         <c ca="left">
            <p>8065</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Percentage of PAGs</p>
         </c>
         <c ca="left">
            <p>6.2%</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Average PAGs for plasmid</p>
         </c>
         <c ca="left">
            <p>4.3</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>PAGs in clusters (&#8805; 2 genes)</p>
         </c>
         <c ca="left">
            <p>1653</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Cluster of PAGs (&#8805; 2 genes)</p>
         </c>
         <c ca="left">
            <p>677</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>PAGs <it>per </it>clusters (on average)</p>
         </c>
         <c ca="left">
            <p>2.4</p>
         </c>
      </r>
   </tblbdy></tbl>
<suppl id="S2">
<title>
<p>Additional file 2</p>
</title>
<text>
<p>
<b>all PAGs sequences</b>. Fasta files embedding all the sequences identified as PAGs at the different CI values.</p>
</text>
<file name="1471-2164-12-403-S2.ZIP">
   <p>Click here for file</p>
</file>
</suppl>
<fig id="F2"><title><p>Figure 2</p></title><caption><p>PAGs distribution</p></caption><text>
   <p><b>PAGs distribution</b>. PAGs ditribution across the plasmids composing the dataset.</p>
</text><graphic file="1471-2164-12-403-2" hint_layout="single"/></fig>
<suppl id="S3">
<title>
<p>Additional file 3</p>
</title>
<text>
<p>
<b>lower confidence PAGs distribution</b>. PAGs (retrieved at 70%, 80% and 90% CIs) ditribution across the plasmids composing the dataset.</p>
</text>
<file name="1471-2164-12-403-S3.PPT">
   <p>Click here for file</p>
</file>
</suppl>
<p>To find possible correlations between PAGs distribution and the taxonomy of their hosting cells, we evaluated the percentage of PAGs for each of the genera embedded in our dataset. For clarity purposes, in Figure <figr fid="F3">3</figr> we report only results concerning those genera for which it was possible to retrieve at least 7 plasmids during dataset assembly procedures (<it>see </it>Methods); complete results are provided in Additional file <supplr sid="S4">4</supplr>. Overall, we found that PAGs number varies greatly among the different genera (Figure <figr fid="F3">3</figr>). Besides, since the taxonomical distribution of PAGs might be strongly influenced by the overall number of sampled sequences from each genus, we evaluated the statistical significance of this distribution by comparing them with 10000 randomly assembled ones, obtained re-shuffling the 8065 PAGs within the entire plasmids dataset and counting the fraction of times each genus possessed a number of PAGs greater or lower than the observed one. This gives a <it>p</it>-value accounting for the statistical significance of the number of PAGs retrieved within each genus in respect to our data model, i.e. that PAGs were randomly distributed within the plasmids.</p>
<fig id="F3"><title><p>Figure 3</p></title><caption><p>Taxonomical distribution of PAGs</p></caption><text>
   <p><b>Taxonomical distribution of PAGs</b>. Relative PAGs abundance and standard deviations across the most represented genera of the dataset. Red and green bars represent statistically PAGs depleted and enriched genera, respectively. Blue bars represent genera for which the content of PAGs was not statistically different from that expected by chance.</p>
</text><graphic file="1471-2164-12-403-3" hint_layout="single"/></fig>
<suppl id="S4">
<title>
<p>Additional file 4</p>
</title>
<text>
<p>
<b>PAGs and taxonomy</b>. Full taxonomical distribution of PAGs.</p>
</text>
<file name="1471-2164-12-403-S4.PPT">
   <p>Click here for file</p>
</file>
</suppl>
<p>Overall we found that 71 (out of 106) genera possessed an amount of PAGs that was higher (55 PAGs enriched genera) or lower (16 PAGs depleted genera) than that expected to occur by chance (p-value &lt; 10<sup>-4</sup>).</p>
<p>The two best-scoring genera in terms of PAGs content were <it>Acaryochloris </it>and <it>Shigella </it>(22.5% and 17.9% of all the plasmids encoded proteins, respectively); in both cases PAGs enrichment was shown to be statistically significant. Interestingly, plasmids from <it>A. marina </it>were already shown to possess metabolic capabilitities that were probably acquired HGT transfer <abbrgrp>
<abbr bid="B29">29</abbr>
</abbrgrp>, thus confirming their mosaic structure assessed by PAGs analysis. Similarly, the mosaicism of <it>Shigella </it>plasmids is a well known issue <abbrgrp>
<abbr bid="B30">30</abbr>
<abbr bid="B31">31</abbr>
<abbr bid="B32">32</abbr>
</abbrgrp> and has been demonstrated to be biologically relevant since it has probably allowed these strains to acquire pathogenic adaptation <abbrgrp>
<abbr bid="B31">31</abbr>
<abbr bid="B33">33</abbr>
</abbrgrp>.</p>
<p>Both genera resulted PAGs enriched also when lower CI thresholds were applied (respectively 48.5, 74.3 and 103.1 PAGs/plasmid at 90%, 80% and 70% threshold for <it>Acaryochloris </it>and 31.1, 52.4 and 66.8 in the case of <it>Shigella </it>representatives) although lowering CI values below 80% resulted in statistical inconsistency, likely due to the inclusion of too many false positives in the dataset.</p>
<p>At all CI thresholds analyzed, also PAGs depleted plasmids span over a large taxonomic range, comprising Actinobacteria, Firmicutes, Proteobacteria and Cyanobacteria. Interestingly, we found that &#945;-proteobacterial plasmids (mainly from <it>Sinorhizobium</it>, <it>Agrobacterium </it>and <it>Rhizobium</it>) are higlhy represented within PAGs-depleted plasmids, suggesting that plasmids hosted by representatives of this taxonomic unit might undergo recombination/HGT events less frequently than the others. The finding that these bacteria harbor a lower number of PAGs than that expected by chance, might be accounted for by the fact that these are mainly soil inhabiting microorganisms. Indeed, it has been proposed that bacteria inhabiting this ecological niche might represent a less connected component of the overall plasmids-mediated HGT network <abbrgrp>
<abbr bid="B34">34</abbr>
</abbrgrp>. Accordingly, this might partially explain their lower number of PAGs. Alternatively, since it has been suggested that there is considerable gene flow between replicons in the rhizobiaceae <abbrgrp>
<abbr bid="B35">35</abbr>
<abbr bid="B36">36</abbr>
</abbrgrp>, it can be surmised that these bacteria frequently undergo recombination with the chromosomes of their hosting cells. However it is noteworthy that, in most cases, compositional features of rhizobiaceae replicons are pretty similar (as, for example, the GC% content (around 60%) in <it>Sinorhizobium </it>and <it>Rhizobium </it>representatives along all the replicons inhabiting the same cell). Thus, it is absolutely possbile that a fraction of this internal recombination event(s) may remain obscure due to the composition-oriented pipeline developed in this work. Interestingly, for what concerns identified PAGs (for whose identification another compositional measure was addedd to GC% content, i.e. &#948;*) we found that in alpha-proteobacteria chromosomal origin PAGs are more represented in respect to the whole dataset (10% and 5%, respectively, see below), suggesting that plasmids from these microorganisms might really have more genes of chromosomal origin that what is seen in other species.</p>
</sec>
<sec>
<st>
<p>PAGs and plasmids size</p>
</st>
<p>In principle, the acquisition and the maintainance of exogeneous DNA by a plasmid molecule might lead to the expansion of its coding capabilities and, parallely, to an increase of its size. Moreover, it might be expected that larger plasmids possess a higher number of "entry points" for exogeneous DNA in respect to smaller ones, thus somehow promoting the acquisition of novel genetic material. To test these hypotheses, we compared the number of PAGs possessed by each plasmid with its size (Figure <figr fid="F4">4</figr>). We found only a slightly positive correlation (R<sup>2 </sup>= 0.21) between the length (in bp) of each plasmid and its PAGs content. Indeed, within our dataset we retrieved plasmids completely riddled in PAGs as, for example, Far04_lp28-1 plasmid from <it>Borrelia garinii </it>Far04 (27689 nt) where we found almost 80% of atypical genes. Conversely, in a plasmid of similar size (e.g. pXAG81 from <it>Xanthomonas axonopodis pv. Glycines</it>, 26721 bp) we were not able to detect any PAG. This finding might suggest that plasmids can undergo recombination with other informative molecules and/or HGT regardless of their size. Moreover, this might also indicate that, the acquisition of foreign DNA might not be the only force driving the growth and the expansion of plasmids coding capabilities and that other molecular mechanisms (e.g. gene duplication) might play a certain role. Similar trends (with R<sup>2 </sup>values raging from 0.24 to 0.48 were observed) also when PAGs dataset retrieved with lower CI thresholds were analyzed (Additional file <supplr sid="S5">5</supplr>).</p>
<fig id="F4"><title><p>Figure 4</p></title><caption><p>PAGs and plamids size</p></caption><text>
   <p><b>PAGs and plamids size</b>. Scatterplot illustrating the correlation existing between plasmids size and their PAGs content.</p>
</text><graphic file="1471-2164-12-403-4" hint_layout="single"/></fig>
<suppl id="S5">
<title>
<p>Additional file 5</p>
</title>
<text>
<p>
<b>lower confidence PAGs and plasmids size for lower CI values</b>. Scatterplots illustrating the low positive correlation existing between plasmids size and their PAGs content, with three different PAGs datasets retrieved at a) 70%, b) 80% and c) 90%.</p>
</text>
<file name="1471-2164-12-403-S5.PPT">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
<sec>
<st>
<p>PAGs functions</p>
</st>
<p>We aimed at identifying the most represented functions performed by PAGs. To fulfill this task we adopted the computational pipeline implemented in Blast2GO (B2GO) <abbrgrp>
<abbr bid="B37">37</abbr>
</abbrgrp> and performed an automated functional annotation. For most of PAGs it was not possible to identify any associated function. This was somehow expected, since it has already been shown <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp> that the function of most of the plasmid encoded proteins is still unknown. Specifically, on a total of 8065 sequences used as input for B2GO, 803 PAGs did not retrieve any BLAST hit and 4795 retrieved an hypothetical ortholog in the database (in almost all cases corresponding to the query itself) but did not produce any mapping to a known biological function. Hence, overall, an important fraction of PAGs (5598 sequences, almost 68% of the total) could not be linked to any hit in functional databases and, as a consequence, to any known biological function. For the remaining 2467, it was possible to retrieve a putative biological process. For clarity purposes, we show only those biological process that possessed at least 70 representatives within the PAGs dataset (Figure <figr fid="F5">5</figr>, complete results are shown as Additional file <supplr sid="S6">6</supplr>).</p>
<fig id="F5"><title><p>Figure 5</p></title><caption><p>PAGs functions</p></caption><text>
   <p><b>PAGs functions</b>. Histogram showing the distribution of the main biological processes in which PAGs are involved. Red bars refer to processes generally related to DNA mobilization.</p>
</text><graphic file="1471-2164-12-403-5" hint_layout="single"/></fig>
<suppl id="S6">
<title>
<p>Additional file 6</p>
</title>
<text>
<p>
<b>lower confidence PAGs functional annotation</b>. a) Full COG <url>http://www.ncbi.nih.gov/COG</url> functional annotation of PAGs retrieved at 70%, 80% and 90% Cis. In particular, the function each of the sequences embedded in these datasets was inferred according to the one assigned to the best BLAST hit of COG database. b) Full Blast2GO functional annotation of PAGs retrieved at 95% CI.</p>
</text>
<file name="1471-2164-12-403-S6.DOC">
   <p>Click here for file</p>
</file>
</suppl>
<p>As shown in Figure <figr fid="F5">5</figr>, most of the PAGs encoded proteins (around 22% of all the annotated ones) are associated to those molecular functions that are able to catalyze the movement of DNA among and within informative molecules, that is DNA integration, transposition and conjugation (7.9%, 7.3% and 7%, respectively, see red bars in Figure <figr fid="F5">5</figr>). Notably, this result partially validates the applied approach for PAGs detection, since genes that are able to move across different molecules are also expected to differ from a compositional viewpoint from the correspondig hosting molecule. Moreover, we found that another important fraction (303 sequences, corresponding to 10.9% of all the PAGs for which a putative function was retrieved) is represented by proteins involved in transcription regulation (DNA-mediated). A further investigation revealed that these proteins are mainly involved in important biological processes that are usually associated to plasmids and\or trasposons and that have a (more or less) long documentated history of HGTs, such as mercury detoxification (e.g. MerD transcriptional regulator from plasmid pEC-IMP, GI: 226807665), tetracycline resistance (e.g. TetR trascriptional regulator of <it>Salmonella typhimurium </it>R64 plasmid, GI: 32470145) and virulence (e.g. VirF trascriptional regulator in plasmid pSS_046 of <it>Shigella sonnei </it>Ss046, GI: 74314878). The presence of such proteins within the assembled PAGs dataset is intriguing. Indeed, it might be expected that the introgression of proteins capable of interfering with the overall (complex) regulatory network of the cell might be (quite) "dangerous" (from a biological viewpoint) and prone to be counterselected by the novel hosting cell. However, further analyses (see below) revealed that, a considerable amount of PAGs are embedded in more or less compact clusters, involved in processes that are known to be often spread by HGT (including virulence, antibiotic resistance and heavy metals detoxification). Accordingly, atypical trascriptional regulators might be part of this gene clusters and, consequently, might be involved in the regulation of the flanking regions. Alternatively, it can be surmised that, after the introgression of the atypical transcriptional regulator, some modifications (i.e. mutations) might have occurred, rendering the newly acquired sequence compatible with the overall "new" regulatory network, that is more easily recognizable by the transcriptional apparatus of the host cell, as experimentally demonstrated <abbrgrp>
<abbr bid="B38">38</abbr>
</abbrgrp>. After these two biological processes, we found that pathogenesis- and antibiotic resistance-related sequences are the most abundant among annotated PAGs (6.8% and 5.2%, respectively). Remarkably, the finding that these two biological processes are highly represented among atypical genes underlines the key role that HGT possesses in the spreading of these two important biological features within the microbial world.</p>
<p>Finally, the high percentage of PAGs that did not retrieve any BLAST hit during the B2GO searches, might speak towards the presence of an important fraction of pseudogenes within our PAGs dataset. Accordingly, these genes might represent aberrant sequences that, in the absence of strong selective pressure, have accumulated a great number of mutations and have evolved beyond recognition. The fact that we identified them as PAGs might rely on the observation that pseudogenes are often originated from failed HGT events between two different source molecules <abbrgrp>
<abbr bid="B39">39</abbr>
</abbrgrp>. Interestingly, also premature stop codons (and, consequently, shorter proteins) have been indicated as a typical characteristic of pseudogenes that, in turn, are known to originate (at least in part) from failed HGT <abbrgrp>
<abbr bid="B39">39</abbr>
</abbrgrp>. Indeed, PAGs encoded proteins are, on average, shorter (136,5 aminoacids) than not-atypical ones (274,2 aminoacids, Figure <figr fid="F6">6</figr>), suggesting that part of the PAGs retrieved might indeed be pseudogenes, probably originated from unsuccessful integration in the host plasmids. Notably, PAGs datasests retrieved at different thresholds followed the same overall trend (Additional File <supplr sid="S7">7</supplr>).</p>
<fig id="F6"><title><p>Figure 6</p></title><caption><p>PAGs length</p></caption><text>
   <p><b>PAGs length</b>. The length of PAGs encoded proteins (red line) compared to the length of all the other plasmids encoded proteins (blue line) present in the dataset.</p>
</text><graphic file="1471-2164-12-403-6" hint_layout="single"/></fig>
<suppl id="S7">
<title>
<p>Additional file 7</p>
</title>
<text>
<p>
<b>lower confidence PAGs length</b>. The length of lower confidence PAGs encoded proteins.</p>
</text>
<file name="1471-2164-12-403-S7.PPT">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
</sec>
<sec>
<st>
<p>PAGs source molecules</p>
</st>
<p>As previously pointed out, PAGs may derive from homologous recombination or homology-facilitated illegitimate recombination between a given plasmid and/or other informative molecules, that is phages, chromosomes or other plasmids. Hence, we aimed at identifying the putative source molecule of PAGs. To this purpose, we have developed a computational approach (see Methods) that, on the basis of the number of orthologs present in different (and randomly assembled) plasmid, chromosome and phage databases allows to assign the most likely source molecule to a set of sequences (in our case the 8065 PAGs). It must be stated clear that this kind of strategy has only an explorative purpose and might be strongly influenced by the present content of public databases that, undoubtedly, represents just a glimpse of the real biodiversity present in nature. For this reason, the reliability of the developed approach was firstly revealed by a test on a set of 1000 likely chromosomal native sequences retrieved by Cortez et al. <abbrgrp>
<abbr bid="B10">10</abbr>
</abbrgrp> from a set of 119 bacterial and archaeal chromosomes. Indeed, results of this preliminar screening showed that almost 90% of the probed sequences were correctly annotated, i.e. resulted to possess a putative chromosome origin. Furthermore, we sought to test the implemented strategy on a set of sequences of plasmid and phage origin. However, as already pointed out, in these cases a set of "core" sequences is very difficult (if not impossible) to be retrieved. Hence, the previously described pipeline was applied on two distinct randomly assembled datasets of viral and plasmids sequences, embedding 1000 sequences each. Results similar to those obtained with (likely) chromosome native sequences were obtained (84% and 82% of "correct" identifications in the case of plasmdis and phages, respectively) thus suggesting that, in most cases, the implemented strategy is able to detect the correct source molecule of a given sequence.</p>
<p>Applying the described strategy to our PAGs dataset revealed that, as shown in Figure <figr fid="F7">7</figr>, most of the identified PAGs (almost 75%) are of likely plasmid origin. On the contrary, chromosome and phage sequences appear to be much rarer, being represented by 5.7% and 4.8% of PAGs, respectively. Remarkably, almost 13% of all the probed PAGs (1087), did not retrieve a clear corresponding match in the plasmid, chromosome and phage databases and, for this reason, their origin remained "undetermined" (Figure <figr fid="F7">7</figr>). Finally, less than 1% of the PAGs did not retrieve any match in any of the databases during our BLAST probings and were labelled as "Not Found" in Figure <figr fid="F7">7</figr>. Again, also sequences retrieved with lower CI thresholds showed the same hypothetical origin, with likely plasmid orgin sequences being largely over-represented (73.8%, 72.9% and 73.5% for 70%, 80% and 90% CI thresholds, respectively). Taken together these results indicate that most of the PAGs are likely to appear only in plasmids rather than begin shared among different types of informative molecules. We speculate here that most of retrieved PAGs likely derive from plasmid-plasmid gene exchanges and that, <it>vice versa</it>, integrations following virus-plasmid and/or chromosome-plasmid gene exchange appear to be less frequent. Remarkably, this finding fits with previous analyses <abbrgrp>
<abbr bid="B40">40</abbr>
</abbrgrp> on the structure of the overall DNA-exchange network and that suggested that DNA families are mostly exchanged among the same type of DNA carriers (i.e. plasmids, phages or chromosomes). Our analyses indicate that this might hold true (at least) in the case of plasmid molecules.</p>
<fig id="F7"><title><p>Figure 7</p></title><caption><p>PAGs origin</p></caption><text>
   <p><b>PAGs origin</b>. Pie chart showing the most likely source molecule of retrieved PAGs. Und. stands for "Undetermined " (<it>see </it>text for details).</p>
</text><graphic file="1471-2164-12-403-7" hint_layout="single"/></fig>
</sec>
<sec>
<st>
<p>Clusters of PAGs</p>
</st>
<p>Chromosomal atypical genes are often organized in (more or less) tight clusters. These chromosomal regions, named genomic islands <abbrgrp>
<abbr bid="B10">10</abbr>
</abbrgrp>, embeds, for example, genes involved in pilus and fimbriae formation, lipopolysaccharide biosynthesis or virulence and have been shown to have played a crucial role in microbial evolution <abbrgrp>
<abbr bid="B41">41</abbr>
</abbrgrp>. Similarly, PAGs might either stand alone or be embedded in (more or less compact) clusters on their source molecule representing, for example, successfully integrated transposons and/or integrons. Hence, we investigated this issue and evaluated if retrieved PAGs lied in clusters or were scattered throughout the corresponding plasmid. We estimated gene clustering at three different gene distance thresholds, that is 100, 200 and 300 bp. Results shown below refer to genes that are not separated by more than 200 bp (results for 100 and 300 bp are provided as Additional file <supplr sid="S8">8</supplr>, together with clusters distribution retrieved at different CI thresholds). Overall we found that 1653 PAGs (on a total of 8065) are embedded in 677 clusters of different size (Table <tblr tid="T1">1</tblr> and Figure <figr fid="F8">8</figr>). The complete list of all identified PAGs clusters, together with their corresponding organisms and GI codes, are provided as additional material (<it>see </it>Additional file <supplr sid="S9">9</supplr>). Most of them (503) are bi-cistronic clusters, while another important fraction (119) is represented by arrays of three genes. Longer clusters were quite rare and we found only 26, 21 and 5 clusters embedding 4, 5 and 6 genes, respectively. This trend is in partial agreement with previous findings on cromosomal clusters of atypical genes <abbrgrp>
<abbr bid="B10">10</abbr>
</abbrgrp> and suggests that PAGs clusters are quickly fragmented and/or eroded following integration. Alternatively, this could suggest that the transfer and integration of shorter clusters is somehow favoured in respect to longer ones. Finally, as shown in Additional file <supplr sid="S10">10</supplr>, short gene arrays seem to be more frequent also when PAGs were retrieved al lower CIs.</p>
<suppl id="S8">
<title>
<p>Additional file 8</p>
</title>
<text>
<p>
<b>clusters of lower confidence PAGs</b>. Results of gene clusters analysis for a) 100, 200 and 300 bp gene distance threshold and b) for PAGs retrieved at 70%, 80% and 90% CIs.</p>
</text>
<file name="1471-2164-12-403-S8.PPT">
   <p>Click here for file</p>
</file>
</suppl>
<fig id="F8"><title><p>Figure 8</p></title><caption><p>PAGs clusters</p></caption><text>
   <p><b>PAGs clusters</b>. PAGs clusters size distribution. Clusters longer than 8 genes are not shown here.</p>
</text><graphic file="1471-2164-12-403-8" hint_layout="single"/></fig>
<suppl id="S9">
<title>
<p>Additional file 9</p>
</title>
<text>
<p>
<b>complete PAGs clusters accession codes</b>. The complete list of all identified PAGs (retrieved at 95% CI threshold) clusters (at 200 bp threshold), together with their corresponding organisms and GI codes.</p>
</text>
<file name="1471-2164-12-403-S9.XLS">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S10">
<title>
<p>Additional file 10</p>
</title>
<text>
<p>
<b>the atypical <it>mer, tet, maxi </it>and <it>cbi </it>clusters</b>. Schematic representation of the atypical <it>mer, tet, maxi </it>and <it>cbi </it>clusters found in different plasmids of different microorganisms.</p>
</text>
<file name="1471-2164-12-403-S10.PPT">
   <p>Click here for file</p>
</file>
</suppl>
<p>Among identified PAGs clusters, some possess a partially documented evolutionary history, mainly driven by HGT/recombination events as, for example, mercury resistance gene cluster(s). In fact, mercury resistance genes (<it>mer</it>) have been usually found embedded in a single compact operon <abbrgrp>
<abbr bid="B42">42</abbr>
</abbrgrp> that, in turn, has been suggested to represent an aberrant mercury resistance transposon (namely TndPKHLK2) that, in some cases, has lost those genes responsible for its transposition <abbrgrp>
<abbr bid="B43">43</abbr>
</abbrgrp>. The analysis of PAGs clusters allowed the identification of (at least) 10 different <it>mer </it>clusters (see, Additional file <supplr sid="S10">10</supplr>) that showed a different composition in respect to the source molecule, thus revealing the pivotal role of HGT in the spreading of this metabolic ability across bacteria belonging to (sometimes) very different taxonomical units and inhabiting separate ecological niches. Moreover, other gene clusters (or part thereof) coding for important biological traits (i.e. antibiotic resistance, host invasion and cobalamin biosynthesis) were retrieved. For example invasion-associated genes encoding proteins involved in invasion of mammalian cells were found in atypical clusters on plasmids retrieved from different specie of <it>Shigella </it>genus (see Additional file <supplr sid="S10">10</supplr>) providing further support to the idea that one (or more) HGT envent(s) played a role in spreading this feature within representatives of this genus <abbrgrp>
<abbr bid="B32">32</abbr>
<abbr bid="B44">44</abbr>
</abbrgrp>. Similarly, plasmid mediated HGT seems to have contributed to the spreading of other key metabolic traits in microbial representatives, as, for example, cobalamin biosynthesis <abbrgrp>
<abbr bid="B45">45</abbr>
</abbrgrp> and tetracycline resistance for which very similar (atypical) gene clusters were retrieved from very distantly related microorganisms, including <it>Geobacter</it>, <it>Halorubrum</it>, <it>Methylibium</it>, <it>Methylobacterium </it>and <it>Deinococcus </it>representatives in the case of cobalamin biosynthesis and <it>Escherichia</it>, <it>Klebsiella</it>, <it>Aeromonas</it>, <it>Serratia</it>, <it>Yersinia </it>and <it>Enterobacter </it>in the case of <it>tet </it>genes (see Additional file <supplr sid="S10">10</supplr>).</p>
</sec>
<sec>
<st>
<p>PAGs and conjugative plasmids</p>
</st>
<p>Among plasmids, conjugative ones have been defined "vessels" of the communal gene pool <abbrgrp>
<abbr bid="B11">11</abbr>
</abbrgrp>. Indeed, this class of plasmids possesses the ability to "visit" different cells and, in principle, undergo genetic rearrangements (such as homologous recombination) with other plasmids and/or other informative molecules (phages and chromosomes). For this reason, conjugative plasmids might be expected to possess a higher amount of atypical genes in respect to plasmids that are not (or are less) mobilizable. To test this hypothesis, we performed a comparative analysis of the amount of PAGs belonging to conjugative and non-conjugative plasmids. The conjugative/mobilizable plasmids (hereinafter referred to as CMPs) embedded in our dataset were identified by searching for their propagation-related genes adopting the following strategy: 1) a dataset of genes involved in the propagation of conjugative plasmids (<it>tra </it>and <it>mob </it>like genes) was assembled probing a search on the ACLAME database <url>http://aclame.ulb.ac.be/</url>; 2) a BLAST search with each of the sequence of our dataset against the <it>tra</it>/<it>mob</it>-like database previously assembled (setting a minimum e-value of 1e-20) was carried out and then 3) the number of <it>tra</it>/<it>mob</it>-like genes was determined for each plasmid. In this way we were able to divide our plasmid dataset into (693) conjugative/mobilizable plasmids (CMPs, possessing at least one gene involved in conjugation and/or mobilization) and (1160) not conjugative/mobilizable plasmids (NCMPs, i.e. plasmids that do not possess any <it>tra </it>or <it>mob </it>related genes). Data obtained revealed that, on average, CMPs harbor a higher number of PAGs than that exhibited by NCMPs and that the distribution of PAGs in these two classes of plasmids were significantly different (Mann-Whitney test, p-value &lt; 0.001). Indeed, CMPs possess, on average, 5.7 PAGs, whereas NCMPs posssess only 2.9 atypical ORFs for each plasmid. Data obtained are summarized in Figure <figr fid="F9">9</figr> which shows the different trends for CMPs and NCMPs. This holds true also when PAGs dataset retrieved at lower thresholds (i.e. 70%, 80% and 90%) were analyzed (Additional File <supplr sid="S11">11</supplr>)</p>
<fig id="F9"><title><p>Figure 9</p></title><caption><p>PAGs in conjugative plasmids</p></caption><text>
   <p><b>PAGs in conjugative plasmids</b>. Abundance of PAGs in conjugative/mobilizable plasmids (CMPs) and not conjugative/mobilizable plasmids (NCMPs).</p>
</text><graphic file="1471-2164-12-403-9" hint_layout="single"/></fig>
<suppl id="S11">
<title>
<p>Additional file 11</p>
</title>
<text>
<p>
<b>lower confidence PAGs in CMPs and NCMPs</b>. Distribution of lower confidence PAGs among CMPs and NCMPs</p>
</text>
<file name="1471-2164-12-403-S11.PPT">
   <p>Click here for file</p>
</file>
</suppl>
<p>The high number of PAGs retrieved in CMPs provide strong support to the idea that plasmids have played (and are still playing) a central role in microbial evolution <abbrgrp>
<abbr bid="B11">11</abbr>
</abbrgrp>. In fact, our data suggest that, by visiting different cells, CMPs can undergo recombination event(s) with the host's DNA molecules more frequently than NCMPs; consequently, they can probably acquire pieces of exogeneuos DNA that, in turn, can be further spread within the whole microbial communities. This idea is also partially supported by the finding that CMPs, on average, possess more and longer PAGs clusters in respect to NCMPs as reported in Additional file <supplr sid="S12">12</supplr>.</p>
<suppl id="S12">
<title>
<p>Additional file 12</p>
</title>
<text>
<p>
<b>PAGs clusters in CMPs and NCMPs</b>. Distribution and length of PAGs clusters in CMPs and NCMPs.</p>
</text>
<file name="1471-2164-12-403-S12.PPT">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>Within the complex evolutionary network of plasmids, new functional blocks are added and exchanged. However, to date, information on plasmids atypical regions is available only for a very limited number of plasmids and/or microorganisms.</p>
<p>In this work we have developed a computational pipeline to detect compositionally atypical ORFs that reside on plasmids and performed a large scale analysis of them. Implementing our strategy on a dataset of nearly 2000 plasmids we have identified 8065 PAGs, almost 6% of all the analyzed ORFs. Accordingly, these PAGS are likely the outcome of (one or more) HGT event(s), although it must be mentioned the hypothesis that, at least part of them, may derive from events of internal recombination with chromosome(s) inhabiting the same cytoplasms but that, in some cases, may possess different compositional features in respect to the corresponding plasmids (as suggested in <abbrgrp>
<abbr bid="B7">7</abbr>
</abbrgrp>).</p>
<p>It is worth of noticing that the total amount of retrieved PAGs is, on average, lower than that estimated for chromosomes (10-15%) <abbrgrp>
<abbr bid="B10">10</abbr>
<abbr bid="B20">20</abbr>
</abbrgrp>. This might be due to the high confidence interval (C.I. 95%) applied during our PAGs retrieval pipeline (Figure <figr fid="F1">1</figr>), that might have led to a partial understimatation of the actual amount of atypical ORFs that are integrated on plasmids. Indeed, applying the same pipeline for PAGs retrieval with lower CI thresholds allowed to assemble larger PAGS datasets (embedding 14731, 27201 and 38950 sequences at 70%, 80% and 90% CIs, respectively). Nevertheless analyses on these (lower confidence) assembled datasets revealed overlapping trends, suggesting that the possible exclusion of some false negatives did not influence the general conclusions that can be drawn on PAGs and, more in general, on plasmid-mediated HGT. Overall, we found that PAGs are not uniformly distributed among the sampled plasmids dataset. Indeed most of the plasmids harbor a few atypical genes or do not possess any atypical gene at all, wehreas PAGs enriched plasmids are progressively more rare. This finding is in partial agreement with previous findings on horizontal flow of plasmid genes <abbrgrp>
<abbr bid="B34">34</abbr>
</abbrgrp> and suggests that all plasmids may not contribute equally to the overall horizontal flow of genes but, instead, some of them may occupy more central positions in the overall network of HGT events. In particular, this role might be covered by conjugative plasmids that have been shown to possess, on average, a higher amount of atypical regions in respect to not mobilizable ones.</p>
<p>Interestingly, plasmids size does not sensibly correlate with PAGs content. This result may provide important evolutionary insights, suggesting that the acquisition of exogeneous DNA (i.e. HGT) might not be the only force driving the plasmids assembly and the expansion of their coding capabilities. Indeed, it might be possible that other molecular mechanisms play a role in this process, such as gene duplication (possibly followed by evolutionary divergence) (Maida et al. <it>unpublished data</it>). Moreover, the fact that for a fraction of the identified PAGs it was not possible to retrieve an associated function, together with the observation that PAGs encoded proteins are, on average, shorter than not-atypical ones, points towards the presence of a fraction of pseudogenes within PAGs. Accordingly, these might have originated from unsuccessful HGT events, one of the most likely source of pseudogenes within prokaryotic genomes <abbrgrp>
<abbr bid="B39">39</abbr>
</abbrgrp>.</p>
<p>The automated functional annotation we have performed has revealed that, among the annotated PAGs, most are involved in the overall process of DNA mobilization, although other biologically relevant functions have been identified, such as transcription, pathogenesis and antibiotic resistance. The fact that we have retrieved the genes encoding for these functions associated to atypical DNA regions has important biological drawbacks, underlining the important role of HGT in the bacterial sharing of these key traits. Importantly PAGs are often found in multi-cistronic clusters embedding two or more genes. However, the fact that shorter plasmids (embedding 2 or 3 genes) are much more frequent than longer ones, probably indicates that PAGs clusters are fragmented following thier integration or that, alternatively, the transfer of shorter clusters is favoured in respect to longer gene arrays.</p>
<p>Finally, our analysis revealed that most of the PAGs might be of plasmid origin suggesting that plasmid-plasmid gene exchange might be favoured in respect to phage-plasmid and chromosome-plasmid ones. This is in partial agreement with previous findings <abbrgrp>
<abbr bid="B40">40</abbr>
</abbrgrp> and reveals that a sort of preferential gene flow between vehicles of the same type (in our case plasmids) might exist.</p>
</sec>
<sec>
<st>
<p>Authors' contributions</p>
</st>
<p>MF conceived of the study and wrote the Perl codes for the analyses. MF and EB performed the analyses. MF and RF interpreted the results and wrote the manuscript. All authors read and approved the final manuscript.</p>
</sec>
</bdy><bm>
<ack>
<sec>
<st>
<p>Acknowledgements and Funding</p>
</st>
<p>MF is supported by a postdoctoral fellowship from "Fondazione Adriano Buzzati-Traverso". We are grateful to three anonymous reviewers for their useful comments and suggestions that greatly improved the manuscript.</p>
</sec>
</ack>
<refgrp><bibl id="B1"><title><p>Lateral gene transfer and the nature of bacterial innovation</p></title><aug><au><snm>Ochman</snm><fnm>H</fnm></au><au><snm>Lawrence</snm><fnm>JG</fnm></au><au><snm>Groisman</snm><fnm>EA</fnm></au></aug><source>Nature</source><pubdate>2000</pubdate><volume>405</volume><issue>6784</issue><fpage>299</fpage><lpage>304</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/35012500</pubid><pubid idtype="pmpid" link="fulltext">10830951</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>Bacterial sex: playing voyeurs 50 years later</p></title><aug><au><snm>Kohiyama</snm><fnm>M</fnm></au><au><snm>Hiraga</snm><fnm>S</fnm></au><au><snm>Matic</snm><fnm>I</fnm></au><au><snm>Radman</snm><fnm>M</fnm></au></aug><source>Science</source><pubdate>2003</pubdate><volume>301</volume><issue>5634</issue><fpage>802</fpage><lpage>803</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1085154</pubid><pubid idtype="pmpid" link="fulltext">12907791</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>Analysis of plasmid genes by phylogenetic profiling and visualization of homology relationships using Blast2Network</p></title><aug><au><snm>Brilli</snm><fnm>M</fnm></au><au><snm>Mengoni</snm><fnm>A</fnm></au><au><snm>Fondi</snm><fnm>M</fnm></au><au><snm>Bazzicalupo</snm><fnm>M</fnm></au><au><snm>Lio</snm><fnm>P</fnm></au><au><snm>Fani</snm><fnm>R</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2008</pubdate><volume>9</volume><fpage>551</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-9-551</pubid><pubid idtype="pmcid">2640388</pubid><pubid idtype="pmpid" link="fulltext">19099604</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>Horizontal gene transfer between bacteria</p></title><aug><au><snm>Heuer</snm><fnm>H</fnm></au><au><snm>Smalla</snm><fnm>K</fnm></au></aug><source>Environ Biosafety Res</source><pubdate>2007</pubdate><volume>6</volume><issue>1-2</issue><fpage>3</fpage><lpage>13</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1051/ebr:2007034</pubid><pubid idtype="pmpid" link="fulltext">17961477</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>Gene cluster analysis method identifies horizontally transferred genes with high reliability and indicates that they provide the main mechanism of operon gain in 8 species of gamma-Proteobacteria</p></title><aug><au><snm>Homma</snm><fnm>K</fnm></au><au><snm>Fukuchi</snm><fnm>S</fnm></au><au><snm>Nakamura</snm><fnm>Y</fnm></au><au><snm>Gojobori</snm><fnm>T</fnm></au><au><snm>Nishikawa</snm><fnm>K</fnm></au></aug><source>Mol Biol Evol</source><pubdate>2007</pubdate><volume>24</volume><issue>3</issue><fpage>805</fpage><lpage>813</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">17185745</pubid></xrefbib></bibl><bibl id="B6"><title><p>Dinucleotide relative abundance extremes: a genomic signature</p></title><aug><au><snm>Karlin</snm><fnm>S</fnm></au><au><snm>Burge</snm><fnm>C</fnm></au></aug><source>Trends Genet</source><pubdate>1995</pubdate><volume>11</volume><issue>7</issue><fpage>283</fpage><lpage>290</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S0168-9525(00)89076-9</pubid><pubid idtype="pmpid" link="fulltext">7482779</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>Compositional discordance between prokaryotic plasmids and host chromosomes</p></title><aug><au><snm>van Passel</snm><fnm>MW</fnm></au><au><snm>Bart</snm><fnm>A</fnm></au><au><snm>Luyf</snm><fnm>AC</fnm></au><au><snm>van Kampen</snm><fnm>AH</fnm></au><au><snm>van der Ende</snm><fnm>A</fnm></au></aug><source>BMC Genomics</source><pubdate>2006</pubdate><volume>7</volume><fpage>26</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-7-26</pubid><pubid idtype="pmcid">1382213</pubid><pubid idtype="pmpid" link="fulltext">16480495</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>Evolution of microbial genomes: sequence acquisition and loss</p></title><aug><au><snm>Berg</snm><fnm>OG</fnm></au><au><snm>Kurland</snm><fnm>CG</fnm></au></aug><source>Mol Biol Evol</source><pubdate>2002</pubdate><volume>19</volume><issue>12</issue><fpage>2265</fpage><lpage>2276</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">12446817</pubid></xrefbib></bibl><bibl id="B9"><title><p>Nucleotide sequence of pOLA52: a conjugative IncX1 plasmid from Escherichia coli which enables biofilm formation and multidrug efflux</p></title><aug><au><snm>Norman</snm><fnm>A</fnm></au><au><snm>Hansen</snm><fnm>LH</fnm></au><au><snm>She</snm><fnm>Q</fnm></au><au><snm>Sorensen</snm><fnm>SJ</fnm></au></aug><source>Plasmid</source><pubdate>2008</pubdate><volume>60</volume><issue>1</issue><fpage>59</fpage><lpage>74</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.plasmid.2008.03.003</pubid><pubid idtype="pmpid" link="fulltext">18440636</pubid></pubidlist></xrefbib></bibl><bibl id="B10"><title><p>A hidden reservoir of integrative elements is the major source of recently acquired foreign genes and ORFans in archaeal and bacterial genomes</p></title><aug><au><snm>Cortez</snm><fnm>D</fnm></au><au><snm>Forterre</snm><fnm>P</fnm></au><au><snm>Gribaldo</snm><fnm>S</fnm></au></aug><source>Genome Biol</source><pubdate>2009</pubdate><volume>10</volume><issue>6</issue><fpage>R65</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2009-10-6-r65</pubid><pubid idtype="pmcid">2718499</pubid><pubid idtype="pmpid" link="fulltext">19531232</pubid></pubidlist></xrefbib></bibl><bibl id="B11"><title><p>Conjugative plasmids: vessels of the communal gene pool</p></title><aug><au><snm>Norman</snm><fnm>A</fnm></au><au><snm>Hansen</snm><fnm>LH</fnm></au><au><snm>Sorensen</snm><fnm>SJ</fnm></au></aug><source>Philos Trans R Soc Lond B Biol Sci</source><pubdate>2009</pubdate><volume>364</volume><issue>1527</issue><fpage>2275</fpage><lpage>2289</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1098/rstb.2009.0037</pubid><pubid idtype="pmcid">2873005</pubid><pubid idtype="pmpid" link="fulltext">19571247</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>When phage, plasmids, and transposons collide: genomic islands, and conjugative- and mobilizable-transposons as a mosaic continuum</p></title><aug><au><snm>Osborn</snm><fnm>AM</fnm></au><au><snm>Boltner</snm><fnm>D</fnm></au></aug><source>Plasmid</source><pubdate>2002</pubdate><volume>48</volume><issue>3</issue><fpage>202</fpage><lpage>212</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S0147-619X(02)00117-8</pubid><pubid idtype="pmpid" link="fulltext">12460536</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>Mobile elements as a combination of functional modules</p></title><aug><au><snm>Toussaint</snm><fnm>A</fnm></au><au><snm>Merlin</snm><fnm>C</fnm></au></aug><source>Plasmid</source><pubdate>2002</pubdate><volume>47</volume><issue>1</issue><fpage>26</fpage><lpage>35</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1006/plas.2001.1552</pubid><pubid idtype="pmpid" link="fulltext">11798283</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>Mosaic structure of plasmids from natural populations of Escherichia coli</p></title><aug><au><snm>Boyd</snm><fnm>EF</fnm></au><au><snm>Hill</snm><fnm>CW</fnm></au><au><snm>Rich</snm><fnm>SM</fnm></au><au><snm>Hartl</snm><fnm>DL</fnm></au></aug><source>Genetics</source><pubdate>1996</pubdate><volume>143</volume><issue>3</issue><fpage>1091</fpage><lpage>1100</lpage><xrefbib><pubidlist><pubid idtype="pmcid">1207381</pubid><pubid idtype="pmpid" link="fulltext">8807284</pubid></pubidlist></xrefbib></bibl><bibl id="B15"><title><p>Mosaic plasmids and mosaic replicons: evolutionary lessons from the analysis of genetic diversity in IncFII-related replicons</p></title><aug><au><snm>Osborn</snm><fnm>AM</fnm></au><au><snm>da Silva Tatley</snm><fnm>FM</fnm></au><au><snm>Steyn</snm><fnm>LM</fnm></au><au><snm>Pickup</snm><fnm>RW</fnm></au><au><snm>Saunders</snm><fnm>JR</fnm></au></aug><source>Microbiology</source><pubdate>2000</pubdate><volume>146</volume><issue>Pt 9</issue><fpage>2267</fpage><lpage>2275</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">10974114</pubid></xrefbib></bibl><bibl id="B16"><title><p>Detecting horizontally transferred and essential genes based on dinucleotide relative abundance</p></title><aug><au><snm>Baran</snm><fnm>RH</fnm></au><au><snm>Ko</snm><fnm>H</fnm></au></aug><source>DNA Res</source><pubdate>2008</pubdate><volume>15</volume><issue>5</issue><fpage>267</fpage><lpage>276</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/dnares/dsn021</pubid><pubid idtype="pmcid">2575891</pubid><pubid idtype="pmpid" link="fulltext">18799480</pubid></pubidlist></xrefbib></bibl><bibl id="B17"><title><p>Analysis of an Actinobacillus pleuropneumoniae multi-resistance plasmid, pHB0503</p></title><aug><au><snm>Kang</snm><fnm>M</fnm></au><au><snm>Zhou</snm><fnm>R</fnm></au><au><snm>Liu</snm><fnm>L</fnm></au><au><snm>Langford</snm><fnm>PR</fnm></au><au><snm>Chen</snm><fnm>H</fnm></au></aug><source>Plasmid</source><pubdate>2009</pubdate><volume>61</volume><issue>2</issue><fpage>135</fpage><lpage>139</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.plasmid.2008.11.001</pubid><pubid idtype="pmpid" link="fulltext">19041669</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>Evidence of a large novel gene pool associated with prokaryotic genomic islands</p></title><aug><au><snm>Hsiao</snm><fnm>WW</fnm></au><au><snm>Ung</snm><fnm>K</fnm></au><au><snm>Aeschliman</snm><fnm>D</fnm></au><au><snm>Bryan</snm><fnm>J</fnm></au><au><snm>Finlay</snm><fnm>BB</fnm></au><au><snm>Brinkman</snm><fnm>FS</fnm></au></aug><source>PLoS Genet</source><pubdate>2005</pubdate><volume>1</volume><issue>5</issue><fpage>e62</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pgen.0010062</pubid><pubid idtype="pmcid">1285063</pubid><pubid idtype="pmpid" link="fulltext">16299586</pubid></pubidlist></xrefbib></bibl><bibl id="B19"><title><p>Comparative analysis of methodologies for the detection of horizontally transferred genes: a reassessment of first-order Markov models</p></title><aug><au><snm>Cortez</snm><fnm>DQ</fnm></au><au><snm>Lazcano</snm><fnm>A</fnm></au><au><snm>Becerra</snm><fnm>A</fnm></au></aug><source>In Silico Biol</source><pubdate>2005</pubdate><volume>5</volume><issue>5-6</issue><fpage>581</fpage><lpage>592</lpage><xrefbib><pubid idtype="pmpid">16610135</pubid></xrefbib></bibl><bibl id="B20"><title><p>Biased biological functions of horizontally transferred genes in prokaryotic genomes</p></title><aug><au><snm>Nakamura</snm><fnm>Y</fnm></au><au><snm>Itoh</snm><fnm>T</fnm></au><au><snm>Matsuda</snm><fnm>H</fnm></au><au><snm>Gojobori</snm><fnm>T</fnm></au></aug><source>Nat Genet</source><pubdate>2004</pubdate><volume>36</volume><issue>7</issue><fpage>760</fpage><lpage>766</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng1381</pubid><pubid idtype="pmpid" link="fulltext">15208628</pubid></pubidlist></xrefbib></bibl><bibl id="B21"><title><p>Horizontal gene transfer in bacterial and archaeal complete genomes</p></title><aug><au><snm>Garcia-Vallve</snm><fnm>S</fnm></au><au><snm>Romeu</snm><fnm>A</fnm></au><au><snm>Palau</snm><fnm>J</fnm></au></aug><source>Genome Res</source><pubdate>2000</pubdate><volume>10</volume><issue>11</issue><fpage>1719</fpage><lpage>1725</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.130000</pubid><pubid idtype="pmcid">310969</pubid><pubid idtype="pmpid" link="fulltext">11076857</pubid></pubidlist></xrefbib></bibl><bibl id="B22"><title><p>A novel strategy for the identification of genomic islands by comparative analysis of the contents and contexts of tRNA sites in closely related bacteria</p></title><aug><au><snm>Ou</snm><fnm>HY</fnm></au><au><snm>Chen</snm><fnm>LL</fnm></au><au><snm>Lonnen</snm><fnm>J</fnm></au><au><snm>Chaudhuri</snm><fnm>RR</fnm></au><au><snm>Thani</snm><fnm>AB</fnm></au><au><snm>Smith</snm><fnm>R</fnm></au><au><snm>Garton</snm><fnm>NJ</fnm></au><au><snm>Hinton</snm><fnm>J</fnm></au><au><snm>Pallen</snm><fnm>M</fnm></au><au><snm>Barer</snm><fnm>MR</fnm></au><etal/></aug><source>Nucleic Acids Res</source><pubdate>2006</pubdate><volume>34</volume><issue>1</issue><fpage>e3</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gnj005</pubid><pubid idtype="pmcid">1326021</pubid><pubid idtype="pmpid" link="fulltext">16414954</pubid></pubidlist></xrefbib></bibl><bibl id="B23"><title><p>Amelioration of bacterial genomes: rates of change and exchange</p></title><aug><au><snm>Lawrence</snm><fnm>JG</fnm></au><au><snm>Ochman</snm><fnm>H</fnm></au></aug><source>J Mol Evol</source><pubdate>1997</pubdate><volume>44</volume><issue>4</issue><fpage>383</fpage><lpage>397</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1007/PL00006158</pubid><pubid idtype="pmpid" link="fulltext">9089078</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>How to interpret an anonymous bacterial genome: machine learning approach to gene identification</p></title><aug><au><snm>Hayes</snm><fnm>WS</fnm></au><au><snm>Borodovsky</snm><fnm>M</fnm></au></aug><source>Genome Res</source><pubdate>1998</pubdate><volume>8</volume><issue>11</issue><fpage>1154</fpage><lpage>1171</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">9847079</pubid></xrefbib></bibl><bibl id="B25"><title><p>An acquisition account of genomic islands based on genome signature comparisons</p></title><aug><au><snm>van Passel</snm><fnm>MW</fnm></au><au><snm>Bart</snm><fnm>A</fnm></au><au><snm>Thygesen</snm><fnm>HH</fnm></au><au><snm>Luyf</snm><fnm>AC</fnm></au><au><snm>van Kampen</snm><fnm>AH</fnm></au><au><snm>van der Ende</snm><fnm>A</fnm></au></aug><source>BMC Genomics</source><pubdate>2005</pubdate><volume>6</volume><fpage>163</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-6-163</pubid><pubid idtype="pmcid">1310630</pubid><pubid idtype="pmpid" link="fulltext">16297239</pubid></pubidlist></xrefbib></bibl><bibl id="B26"><aug><au><snm>Fisher</snm><fnm>RA</fnm></au></aug><source>Statistical Methods for Research Workers</source><publisher>Edimburg: Oliver and Boyd</publisher><pubdate>1925</pubdate></bibl><bibl id="B27"><title><p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</p></title><aug><au><snm>Altschul</snm><fnm>SF</fnm></au><au><snm>Madden</snm><fnm>TL</fnm></au><au><snm>Schaffer</snm><fnm>AA</fnm></au><au><snm>Zhang</snm><fnm>J</fnm></au><au><snm>Zhang</snm><fnm>Z</fnm></au><au><snm>Miller</snm><fnm>W</fnm></au><au><snm>Lipman</snm><fnm>DJ</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>1997</pubdate><volume>25</volume><issue>17</issue><fpage>3389</fpage><lpage>3402</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/25.17.3389</pubid><pubid idtype="pmcid">146917</pubid><pubid idtype="pmpid" link="fulltext">9254694</pubid></pubidlist></xrefbib></bibl><bibl id="B28"><title><p>R: A Language and Environment for Statistical Computing</p></title><aug><au><cnm>R-Development-Core-Team</cnm></au></aug><source>Vienna, Austria</source><pubdate>2011</pubdate></bibl><bibl id="B29"><title><p>Niche adaptation and genome expansion in the chlorophyll d-producing cyanobacterium Acaryochloris marina</p></title><aug><au><snm>Swingley</snm><fnm>WD</fnm></au><au><snm>Chen</snm><fnm>M</fnm></au><au><snm>Cheung</snm><fnm>PC</fnm></au><au><snm>Conrad</snm><fnm>AL</fnm></au><au><snm>Dejesa</snm><fnm>LC</fnm></au><au><snm>Hao</snm><fnm>J</fnm></au><au><snm>Honchak</snm><fnm>BM</fnm></au><au><snm>Karbach</snm><fnm>LE</fnm></au><au><snm>Kurdoglu</snm><fnm>A</fnm></au><au><snm>Lahiri</snm><fnm>S</fnm></au><etal/></aug><source>Proc Natl Acad Sci USA</source><pubdate>2008</pubdate><volume>105</volume><issue>6</issue><fpage>2005</fpage><lpage>2010</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.0709772105</pubid><pubid idtype="pmcid">2538872</pubid><pubid idtype="pmpid" link="fulltext">18252824</pubid></pubidlist></xrefbib></bibl><bibl id="B30"><title><p>The virulence plasmid pWR100 and the repertoire of proteins secreted by the type III secretion apparatus of Shigella flexneri</p></title><aug><au><snm>Buchrieser</snm><fnm>C</fnm></au><au><snm>Glaser</snm><fnm>P</fnm></au><au><snm>Rusniok</snm><fnm>C</fnm></au><au><snm>Nedjari</snm><fnm>H</fnm></au><au><snm>D&apos;Hauteville</snm><fnm>H</fnm></au><au><snm>Kunst</snm><fnm>F</fnm></au><au><snm>Sansonetti</snm><fnm>P</fnm></au><au><snm>Parsot</snm><fnm>C</fnm></au></aug><source>Mol Microbiol</source><pubdate>2000</pubdate><volume>38</volume><issue>4</issue><fpage>760</fpage><lpage>771</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1046/j.1365-2958.2000.02179.x</pubid><pubid idtype="pmpid" link="fulltext">11115111</pubid></pubidlist></xrefbib></bibl><bibl id="B31"><title><p>The complete sequence and analysis of the large virulence plasmid pSS of Shigella sonnei</p></title><aug><au><snm>Jiang</snm><fnm>Y</fnm></au><au><snm>Yang</snm><fnm>F</fnm></au><au><snm>Zhang</snm><fnm>X</fnm></au><au><snm>Yang</snm><fnm>J</fnm></au><au><snm>Chen</snm><fnm>L</fnm></au><au><snm>Yan</snm><fnm>Y</fnm></au><au><snm>Nie</snm><fnm>H</fnm></au><au><snm>Xiong</snm><fnm>Z</fnm></au><au><snm>Wang</snm><fnm>J</fnm></au><au><snm>Dong</snm><fnm>J</fnm></au><etal/></aug><source>Plasmid</source><pubdate>2005</pubdate><volume>54</volume><issue>2</issue><fpage>149</fpage><lpage>159</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.plasmid.2005.03.002</pubid><pubid idtype="pmpid" link="fulltext">16122562</pubid></pubidlist></xrefbib></bibl><bibl id="B32"><title><p>Complete DNA sequence and analysis of the large virulence plasmid of Shigella flexneri</p></title><aug><au><snm>Venkatesan</snm><fnm>MM</fnm></au><au><snm>Goldberg</snm><fnm>MB</fnm></au><au><snm>Rose</snm><fnm>DJ</fnm></au><au><snm>Grotbeck</snm><fnm>EJ</fnm></au><au><snm>Burland</snm><fnm>V</fnm></au><au><snm>Blattner</snm><fnm>FR</fnm></au></aug><source>Infect Immun</source><pubdate>2001</pubdate><volume>69</volume><issue>5</issue><fpage>3271</fpage><lpage>3285</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/IAI.69.5.3271-3285.2001</pubid><pubid idtype="pmcid">98286</pubid><pubid idtype="pmpid" link="fulltext">11292750</pubid></pubidlist></xrefbib></bibl><bibl id="B33"><title><p>Large plasmids associated with virulence in Shigella species have a common function necessary for epithelial cell penetration</p></title><aug><au><snm>Watanabe</snm><fnm>H</fnm></au><au><snm>Nakamura</snm><fnm>A</fnm></au></aug><source>Infect Immun</source><pubdate>1985</pubdate><volume>48</volume><issue>1</issue><fpage>260</fpage><lpage>262</lpage><xrefbib><pubidlist><pubid idtype="pmcid">261946</pubid><pubid idtype="pmpid" link="fulltext">3980088</pubid></pubidlist></xrefbib></bibl><bibl id="B34"><title><p>The horizontal flow of the plasmid resistome: clues from inter-generic similarity networks</p></title><aug><au><snm>Fondi</snm><fnm>M</fnm></au><au><snm>Fani</snm><fnm>R</fnm></au></aug><source>Environ Microbiol</source><pubdate>2010</pubdate></bibl><bibl id="B35"><title><p>Population mixing of Rhizobium leguminosarum bv. viciae nodulating Vicia faba: the role of recombination and lateral gene transfer</p></title><aug><au><snm>Tian</snm><fnm>CF</fnm></au><au><snm>Young</snm><fnm>JP</fnm></au><au><snm>Wang</snm><fnm>ET</fnm></au><au><snm>Tamimi</snm><fnm>SM</fnm></au><au><snm>Chen</snm><fnm>WX</fnm></au></aug><source>FEMS Microbiol Ecol</source><pubdate>2010</pubdate><volume>73</volume><issue>3</issue><fpage>563</fpage><lpage>576</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">20533948</pubid></xrefbib></bibl><bibl id="B36"><title><p>Introducing the bacterial 'chromid': not a chromosome, not a plasmid</p></title><aug><au><snm>Harrison</snm><fnm>PW</fnm></au><au><snm>Lower</snm><fnm>RP</fnm></au><au><snm>Kim</snm><fnm>NK</fnm></au><au><snm>Young</snm><fnm>JP</fnm></au></aug><source>Trends Microbiol</source><pubdate>2010</pubdate><volume>18</volume><issue>4</issue><fpage>141</fpage><lpage>148</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.tim.2009.12.010</pubid><pubid idtype="pmpid" link="fulltext">20080407</pubid></pubidlist></xrefbib></bibl><bibl id="B37"><title><p>Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research</p></title><aug><au><snm>Conesa</snm><fnm>A</fnm></au><au><snm>Gotz</snm><fnm>S</fnm></au><au><snm>Garcia-Gomez</snm><fnm>JM</fnm></au><au><snm>Terol</snm><fnm>J</fnm></au><au><snm>Talon</snm><fnm>M</fnm></au><au><snm>Robles</snm><fnm>M</fnm></au></aug><source>Bioinformatics</source><pubdate>2005</pubdate><volume>21</volume><issue>18</issue><fpage>3674</fpage><lpage>3676</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/bti610</pubid><pubid idtype="pmpid" link="fulltext">16081474</pubid></pubidlist></xrefbib></bibl><bibl id="B38"><title><p>Expression of horizontally transferred gene clusters: activation by promoter-generating mutations</p></title><aug><au><snm>Dabizzi</snm><fnm>S</fnm></au><au><snm>Ammannato</snm><fnm>S</fnm></au><au><snm>Fani</snm><fnm>R</fnm></au></aug><source>Res Microbiol</source><pubdate>2001</pubdate><volume>152</volume><issue>6</issue><fpage>539</fpage><lpage>549</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S0923-2508(01)01228-1</pubid><pubid idtype="pmpid" link="fulltext">11501672</pubid></pubidlist></xrefbib></bibl><bibl id="B39"><title><p>Comprehensive analysis of pseudogenes in prokaryotes: widespread gene decay and failure of putative horizontally transferred genes</p></title><aug><au><snm>Liu</snm><fnm>Y</fnm></au><au><snm>Harrison</snm><fnm>PM</fnm></au><au><snm>Kunin</snm><fnm>V</fnm></au><au><snm>Gerstein</snm><fnm>M</fnm></au></aug><source>Genome Biol</source><pubdate>2004</pubdate><volume>5</volume><issue>9</issue><fpage>R64</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2004-5-9-r64</pubid><pubid idtype="pmcid">522871</pubid><pubid idtype="pmpid" link="fulltext">15345048</pubid></pubidlist></xrefbib></bibl><bibl id="B40"><title><p>Network analyses structure genetic diversity in independent genetic worlds</p></title><aug><au><snm>Halary</snm><fnm>S</fnm></au><au><snm>Leigh</snm><fnm>JW</fnm></au><au><snm>Cheaib</snm><fnm>B</fnm></au><au><snm>Lopez</snm><fnm>P</fnm></au><au><snm>Bapteste</snm><fnm>E</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2010</pubdate><volume>107</volume><issue>1</issue><fpage>127</fpage><lpage>132</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.0908978107</pubid><pubid idtype="pmcid">2806761</pubid><pubid idtype="pmpid" link="fulltext">20007769</pubid></pubidlist></xrefbib></bibl><bibl id="B41"><title><p>Detecting anomalous gene clusters and pathogenicity islands in diverse bacterial genomes</p></title><aug><au><snm>Karlin</snm><fnm>S</fnm></au></aug><source>Trends Microbiol</source><pubdate>2001</pubdate><volume>9</volume><issue>7</issue><fpage>335</fpage><lpage>343</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S0966-842X(01)02079-0</pubid><pubid idtype="pmpid" link="fulltext">11435108</pubid></pubidlist></xrefbib></bibl><bibl id="B42"><title><p>Exploring the evolutionary dynamics of plasmids: the Acinetobacter pan-plasmidome</p></title><aug><au><snm>Fondi</snm><fnm>M</fnm></au><au><snm>Bacci</snm><fnm>G</fnm></au><au><snm>Brilli</snm><fnm>M</fnm></au><au><snm>Papaleo</snm><fnm>CM</fnm></au><au><snm>Mengoni</snm><fnm>A</fnm></au><au><snm>Vaneechoutte</snm><fnm>M</fnm></au><au><snm>Dijkshoorn</snm><fnm>L</fnm></au><au><snm>Fani</snm><fnm>R</fnm></au></aug><source>BMC Evol Biol</source><pubdate>2010</pubdate><volume>10</volume><issue>1</issue><fpage>59</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2148-10-59</pubid><pubid idtype="pmcid">2848654</pubid><pubid idtype="pmpid" link="fulltext">20181243</pubid></pubidlist></xrefbib></bibl><bibl id="B43"><title><p>Translocation of transposition-deficient (TndPKLH2-like) transposons in the natural environment: mechanistic insights from the study of adjacent DNA sequences</p></title><aug><au><snm>Kholodii</snm><fnm>G</fnm></au><au><snm>Mindlin</snm><fnm>S</fnm></au><au><snm>Gorlenko</snm><fnm>Z</fnm></au><au><snm>Petrova</snm><fnm>M</fnm></au><au><snm>Hobman</snm><fnm>J</fnm></au><au><snm>Nikiforov</snm><fnm>V</fnm></au></aug><source>Microbiology</source><pubdate>2004</pubdate><volume>150</volume><issue>Pt 4</issue><fpage>979</fpage><lpage>992</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">15073307</pubid></xrefbib></bibl><bibl id="B44"><title><p>Identification of a pathogenicity island, which contains genes for virulence and avirulence, on a large native plasmid in the bean pathogen Pseudomonas syringae pathovar phaseolicola</p></title><aug><au><snm>Jackson</snm><fnm>RW</fnm></au><au><snm>Athanassopoulos</snm><fnm>E</fnm></au><au><snm>Tsiamis</snm><fnm>G</fnm></au><au><snm>Mansfield</snm><fnm>JW</fnm></au><au><snm>Sesma</snm><fnm>A</fnm></au><au><snm>Arnold</snm><fnm>DL</fnm></au><au><snm>Gibbon</snm><fnm>MJ</fnm></au><au><snm>Murillo</snm><fnm>J</fnm></au><au><snm>Taylor</snm><fnm>JD</fnm></au><au><snm>Vivian</snm><fnm>A</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>1999</pubdate><volume>96</volume><issue>19</issue><fpage>10875</fpage><lpage>10880</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.96.19.10875</pubid><pubid idtype="pmcid">17976</pubid><pubid idtype="pmpid" link="fulltext">10485919</pubid></pubidlist></xrefbib></bibl><bibl id="B45"><title><p>Comparative genome analysis of Lactobacillus reuteri and Lactobacillus fermentum reveal a genomic island for reuterin and cobalamin production</p></title><aug><au><snm>Morita</snm><fnm>H</fnm></au><au><snm>Toh</snm><fnm>H</fnm></au><au><snm>Fukuda</snm><fnm>S</fnm></au><au><snm>Horikawa</snm><fnm>H</fnm></au><au><snm>Oshima</snm><fnm>K</fnm></au><au><snm>Suzuki</snm><fnm>T</fnm></au><au><snm>Murakami</snm><fnm>M</fnm></au><au><snm>Hisamatsu</snm><fnm>S</fnm></au><au><snm>Kato</snm><fnm>Y</fnm></au><au><snm>Takizawa</snm><fnm>T</fnm></au><etal/></aug><source>DNA Res</source><pubdate>2008</pubdate><volume>15</volume><issue>3</issue><fpage>151</fpage><lpage>161</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/dnares/dsn009</pubid><pubid idtype="pmcid">2650639</pubid><pubid idtype="pmpid" link="fulltext">18487258</pubid></pubidlist></xrefbib></bibl></refgrp>
</bm></art>