Detection of horizontal transfer of individual genes by anomalous oligomer frequencies
1 Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, VA 23284, USA
2 Institute of Bioinformatics, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou,, Zhejiang 310058, China
3 Division of Biological Sciences, University of California San Diego, La Jolla, San Diego, CA, 92093, USA
Citation and License
BMC Genomics 2012, 13:245 doi:10.1186/1471-2164-13-245Published: 15 June 2012
Understanding the history of life requires that we understand the transfer of genetic material across phylogenetic boundaries. Detecting genes that were acquired by means other than vertical descent is a basic step in that process. Detection by discordant phylogenies is computationally expensive and not always definitive. Many have used easily computed compositional features as an alternative procedure. However, different compositional methods produce different predictions, and the effectiveness of any method is not well established.
The ability of octamer frequency comparisons to detect genes artificially seeded in cyanobacterial genomes was markedly increased by using as a training set those genes that are highly conserved over all bacteria. Using a subset of octamer frequencies in such tests also increased effectiveness, but this depended on the specific target genome and the source of the contaminating genes. The presence of high frequency octamers and the GC content of the contaminating genes were important considerations. A method comprising best practices from these tests was devised, the Core Gene Similarity (CGS) method, and it performed better than simple octamer frequency analysis, codon bias, or GC contrasts in detecting seeded genes or naturally occurring transposons. From a comparison of predictions with phylogenetic trees, it appears that the effectiveness of the method is confined to horizontal transfer events that have occurred recently in evolutionary time.
The CGS method may be an improvement over existing surrogate methods to detect genes of foreign origin.