Estimating the extent of horizontal gene transfer in metagenomic sequences1Instituto Cavanilles de Biodiversidad y Biología Evolutiva. Universidad de Valencia. Polígono La Coma s/n, 46980 Paterna (Valencia), Spain 2CIBER en Epidemiología y Salud Pública (CIBER-ESP), Spain
BMC Genomics 2008, 9:136doi:10.1186/1471-2164-9-136
Additional filesAdditional file 1: Metagenomic data. This figure shows the contig length and number of ORFs per contig in different metagenomes. Format: PPT Size: 60KB Download file This file can be viewed with: Microsoft PowerPoint Viewer Additional file 2: Supplementary tables. Supplementary table 1: Species and taxonomic classes for the contigs in the random dataset; Supplementary table 2: Results of the phylogenetic assignment of 100 sequences from rare taxa; Supplementary table 3: Number of assignments (class taxonomic rank) for the contigs of different metagenomes. Supplementary table 4: Length of sequence analyzed, length of sequence with homologues, and percentage of sequence for which no homologues could be found Supplementary table 5: Taxa involved in possible HGT events (phylogenetic method). Supplementary table 6: Number of contigs proposed to contain a compositional transition (compositional method) Format: DOC Size: 484KB Download file This file can be viewed with: Microsoft Word Viewer Additional file 3: ORF determination. Blastx run of a metagenomic contig, indicated by the red bar on the top. Eleven hits (homologue proteins) have been found, shown in black. Grey segments in the hits indicate the part of the homologue protein that has not been found. In the merging step, protein hits 1–4 are considered homologues since they are in the same position and frame containing the full length of the protein hit. Therefore they are merged in a single hit. Hits 7–10 are also in the same position and frame, but the alignment covers less than 50% of the protein hit because it is truncated by the end of the contig. In this case, as the other extreme of the protein has been found, the hit is considered valid, and homologues merge as before. Protein hit 5 is in the same frame as hits 1–4, but is much shorter. Therefore, it does not merge with hits 1–4, and is removed in the filtering step where all short hits overlapping with others in the position and frame are removed. Note that protein hit 6 overlaps slightly with hits 1–5, but it is considered a different ORF since they overlap by less than 50% of the length of both proteins. Protein 11 is removed in the filtering step since it covers less than 50% of the protein hit and is not truncated by the extremes of the contig. Format: PPT Size: 27KB Download file This file can be viewed with: Microsoft PowerPoint Viewer Additional file 4: Example of the phylogenetic method. Three examples of the procedure for the taxonomic assignment by the phylogenetic method. Phylogenetic trees have been created with the homologues found for three different metagenomic ORFs. Homologues are coloured according to their taxonomic affiliation. The position of the query metagenomic ORF is signalled by the black arrows in the trees. The tables at the bottom of the trees show the sorted list of the homologues and their distances to the query ORF. A) The query ORF is monophyletic with the Epsilon-proteobacteria taxon. Therefore, the lowest distance scores are those for the homologues belonging to that taxon, and the query ORF is automatically assigned to it. B) The ORF is closely related to alpha-proteobacteria, but there are some homologues belonging to that taxa that are distantly related. Nevertheless, the average distance score for alpha-proteobacteria is more than 15% lower than the average distance for any other taxon, thus allowing its assignment to it. C) In this case, neither the ORF is monophyletic with any taxon, nor the average distance scores allow the assignment, and the query ORF remains unassigned. Format: PPT Size: 77KB Download file This file can be viewed with: Microsoft PowerPoint Viewer Additional file 5: Example of the compositional method. Compositional analysis of several contigs. The upper left panel corresponds to the usual pattern for a compositional homogeneous contig. The other three panels show compositional transitions. Format: PPT Size: 168KB Download file This file can be viewed with: Microsoft PowerPoint Viewer Additional file 6: Influence of database size. Effect of the deletion of a variable percentage of the database used in the performance of taxonomic assignments. The ordinate axis shows the percentage of entries deleted from the database, while the abscissa axis shows the percentage of precision (TP/TP+FP) and sensitivity (TP/TP+FN) for the assignment of the ORFs. Format: PPT Size: 37KB Download file This file can be viewed with: Microsoft PowerPoint Viewer Additional file 7: Full analysis of a single contig. Analysis of contig AAFY01000115 from the whale fall metagenome. Homology searches indicate that the contig contains three ORFs. A: The compositional method identifies a transition in the contig. The first two ORFs show a similar composition, but the third differs. In addition, the second ORF is a transposase, which supports the idea of a probable HGT. B: Taxonomic assignment provides a result for the two first ORFs (alpha-proteobacteria), but not for the third. The third ORF finds only one distant homologue (27% identity) with a gamma-proteobacteria (Acinetobacter sp.), and therefore an assignment cannot be made. As a result, this contig is recognised as a probable HGT only by the compositional method. Format: PPT Size: 171KB Download file This file can be viewed with: Microsoft PowerPoint Viewer |



on Google Scholar







author email
corresponding author email