Genome-level homology and phylogeny of Shewanella (Gammaproteobacteria: lteromonadales: Shewanellaceae)
Committee on Evolutionary Biology, The University of Chicago, Chicago, IL, USA
Division of Fishes, The Field Museum of Natural History, Chicago, IL, USA
BMC Genomics 2011, 12:237 doi:10.1186/1471-2164-12-237Published: 12 May 2011
The explosion in availability of whole genome data provides the opportunity to build phylogenetic hypotheses based on these data as well as the ability to learn more about the genomes themselves. The biological history of genes and genomes can be investigated based on the taxomonic history provided by the phylogeny. A phylogenetic hypothesis based on complete genome data is presented for the genus Shewanella (Gammaproteobacteria: Alteromonadales: Shewanellaceae). Nineteen taxa from Shewanella (16 species and 3 additional strains of one species) as well as three outgroup species representing the genera Aeromonas (Gammaproteobacteria: Aeromonadales: Aeromonadaceae), Alteromonas (Gammaproteobacteria: Alteromonadales: Alteromonadaceae) and Colwellia (Gammaproteobacteria: Alteromonadales: Colwelliaceae) are included for a total of 22 taxa.
Putatively homologous regions were found across unannotated genomes and tested with a phylogenetic analysis. Two genome-wide data-sets are considered, one including only those genomic regions for which all taxa are represented, which included 3,361,015 aligned nucleotide base-pairs (bp) and a second that additionally includes those regions present in only subsets of taxa, which totaled 12,456,624 aligned bp. Alignment columns in these large data-sets were then randomly sampled to create smaller data-sets. After the phylogenetic hypothesis was generated, genome annotations were projected onto the DNA sequence alignment to compare the historical hypothesis generated by the phylogeny with the functional hypothesis posited by annotation.
Individual phylogenetic analyses of the 243 locally co-linear genome regions all failed to recover the genome topology, but the smaller data-sets that were random samplings of the large concatenated alignments all produced the genome topology. It is shown that there is not a single orthologous copy of 16S rRNA across the taxon sampling included in this study and that the relationships among the multiple copies are consistent with 16S rRNA undergoing concerted evolution. Unannotated whole genome data can provide excellent raw material for generating hypotheses of historical homology, which can be tested with phylogenetic analysis and compared with hypotheses of gene function.