This article is part of the supplement: Proceedings of the Tenth Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics
Medians seek the corners, and other conjectures
Department of Mathematics and Statistics, University of Ottawa, 585 King Edward Avenue, Ottawa, Canada K1N 6N5
BMC Bioinformatics 2012, 13(Suppl 19):S5 doi:10.1186/1471-2105-13-S19-S5Published: 19 December 2012
Median construction is at the heart of several approaches to gene-order phylogeny. It has been observed that the solution to a median problem is generally not unique, and that alternate solutions may be quite different. Another concern has to do with a tendency for medians to fall on or near one of the three input orders, and hence to contain no information about the other two.
We conjecture that as gene orders become more random with respect to each other, and as the number of genes increases, the breakpoint median for circular unichromosomal genomes, in both the unsigned and signed cases, tends to approach one of the input genomes, the "corners" in terms of the distance normalized by the number of genes. Moreover, there are alternate solutions that approach each of the other inputs, so that the average distance between solutions is very large. We confirm these claims through simulations, and extend the results to medians of more than three genomes.
This effect also introduces serious biases into the medians of less scrambled genomes. It prompts a reconsideration of the role of the median in gene order phylogeny. Fortunately, for triples of finite length genomes, a small proportion of the median solutions escape the tendency towards the corners, and these are relatively close to each other. This suggests that a focused search for these solutions, though they represent a decreasing minority as genome length increases, is a way out of the pathological tendency we have described.