This article is part of the supplement: Selected articles from the Tenth Asia Pacific Bioinformatics Conference (APBC 2012)
A mixture framework for inferring ancestral gene orders
1 Center for Computational Biology, Beijing Forestry University, Beijing 100083, China
2 Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA
BMC Genomics 2012, 13(Suppl 1):S7 doi:10.1186/1471-2164-13-S1-S7Published: 17 January 2012
Inferring gene orders of ancestral genomes has the potential to provide detailed information about the recent evolution of species descended from them. Current popular tools to infer ancestral genome data (such as GRAPPA and MGR) are all parsimony-based direct optimization methods with the aim to minimize the number of evolutionary events. Recently a new method based on the approach of maximum likelihood is proposed. The current implementation of these direct optimization methods are all based on solving the median problems and achieve more accurate results than the maximum likelihood method. However, both GRAPPA and MGR are extremely time consuming under high rearrangement rates. The maximum likelihood method, on the contrary, runs much faster with less accurate results.
We propose a mixture method to optimize the inference of ancestral gene orders. This method first uses the maximum likelihood approach to identify gene adjacencies that are likely to be present in the ancestral genomes, which are then fixed in the branch-and-bound search of median calculations. This hybrid approach not only greatly speeds up the direct optimization methods, but also retains high accuracy even when the genomes are evolutionary very distant.
Our mixture method produces more accurate ancestral genomes compared with the maximum likelihood method while the computation time is far less than that of the parsimony-based direct optimization methods. It can effectively deal with genome data of relatively high rearrangement rates which is hard for the direct optimization methods to solve in a reasonable amount of time, thus extends the range of data that can be analyzed by the existing methods.