This article is part of the supplement: Proceedings of the Ninth Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics
Genomic distance under gene substitutions
1 Instituto Nacional de Metrologia, Qualidade e Tecnologia, Duque de Caxias, 25250-020, Brazil
2 AG Genominformatik, Technische Fakultät, Universität Bielefeld, Bielefeld, 33594, Germany
BMC Bioinformatics 2011, 12(Suppl 9):S8 doi:10.1186/1471-2105-12-S9-S8Published: 5 October 2011
The distance between two genomes is often computed by comparing only the common markers between them. Some approaches are also able to deal with non-common markers, allowing the insertion or the deletion of such markers. In these models, a deletion and a subsequent insertion that occur at the same position of the genome count for two sorting steps.
Here we propose a new model that sorts non-common markers with substitutions, which are more powerful operations that comprehend insertions and deletions. A deletion and an insertion that occur at the same position of the genome can be modeled as a substitution, counting for a single sorting step.
Comparing genomes with unequal content, but without duplicated markers, we give a linear time algorithm to compute the genomic distance considering substitutions and double-cut-and-join (DCJ) operations. This model provides a parsimonious genomic distance to handle genomes free of duplicated markers, that is in practice a lower bound to the real genomic distances. The method could also be used to refine orthology assignments, since in some cases a substitution could actually correspond to an unannotated orthology.