Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Proceedings of the Tenth Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics

Open Access Proceedings

Approximating the edit distance for genomes with duplicate genes under DCJ, insertion and deletion

Mingfu Shao and Yu Lin

Author Affiliations

Laboratory for Computational Biology and Bioinformatics, EPFL, Lausanne, Switzerland

BMC Bioinformatics 2012, 13(Suppl 19):S13  doi:10.1186/1471-2105-13-S19-S13


The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2105/13/S19/S13


Published:19 December 2012

© 2012 Shao and Lin; licensee BioMed Central Ltd.

This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Computing the edit distance between two genomes under certain operations is a basic problem in the study of genome evolution. The double-cut-and-join (DCJ) model has formed the basis for most algorithmic research on rearrangements over the last few years. The edit distance under the DCJ model can be easily computed for genomes without duplicate genes. In this paper, we study the edit distance for genomes with duplicate genes under a model that includes DCJ operations, insertions and deletions. We prove that computing the edit distance is equivalent to finding the optimal cycle decomposition of the corresponding adjacency graph, and give an approximation algorithm with an approximation ratio of 1.5 + .

Introduction

The combinatorics and algorithmics of genomic rearrangements have been the subject of much research since the problem was formulated in the 1990s [1]. The advent of whole-genome sequencing has provided us with masses of data on which to study genomic rearrangements and has motivated further work. Genomic rearrangements include inversions, transpositions, block exchanges, circularizations, and linearizations, all of which act on a single chromosome, and translocations, fusions, and fissions, which act on two chromosomes. These operations are all implemented in terms of the single double-cut-and-join (DCJ) operation [2,3], which has formed the basis for much algorithmic research on rearrangements over the last few years [4-7]. A DCJ operation makes two cuts in the genome, either in the same chromosome or in two different chromosomes, producing four cut ends, then rejoins the four cut ends.

A basic problem in genome rearrangements is to compute the edit distance, i.e., the minimum number of operations needed to transform one genome into another. For unichromosomal genomes, Hannenhalli and Pevzner gave the first polynomial-time algorithm to compute the edit distance under signed inversions [8], which was later improved to linear time [9]. For multichromosomal genomes, the edit distance under the Hannenhalli-Pevzner model (signed inversions and translocations) has been studied through a series of papers [8,10-12], culminating in a fairly complex linear-time algorithm [4]; under DCJ operations, the edit distance can be computed in linear time in a simple and elegant way [2].

All of the above algorithms for computing edit distances assume equal gene content and no duplicate genes. El-Mabrouk [13] first extended the results of Hannenhalli and Pevzner to compute the edit distance for inversions and deletions. Chen et al. [14] studied the problem of computing the inversion distance for genomes with equal gene content in the presence of duplicate genes--a problem that comes up in determining orthologies, where greedy heuristics were used. Yancopoulos et al. [7] proposed some rules on how to incorporate insertions and deletions into the DCJ model, but no specific algorithms are given. Braga et al. [15] presented a linear-time algorithm to compute the edit distance for DCJ operations, insertions and deletions, but still without duplications. Sébastien Angibaud et al. [16,17] studied several model-free measures between genomes with duplicate genes; they first established a one-to-one correspondence between genes of both genomes, and then computed the measure between two genomes without duplicate genes.

In this paper, we focus on the problem of computing the edit distance between two genomes in the presence of duplications. We define the edit distance at the adjacency set level on a unit-cost model including DCJ operations, insertions and deletions (duplications are a special case of insertions). We reduce the problem of computing such an edit distance to finding the maximum number of certain cycles in the adjacency graph, Finally we give a (1.5 + )-approximation algorithm.

Edit distance

We represent the genomes using the notations introduced by Bergeron et al. [2]. Denote each gene g with its two extremities, the head as gh and the tail as gt. Two consecutive genes a and b can be connected by one adjacency, which is represented by a pair of extremities; thus adjacencies come in four types: atbt, ahbt, atbh, and ahbh (there is no order for these two extremities, i.e., ahbt = btah). If gene g lies at one end of a linear chromosome, then this end can be represented by a single extremity, gt or gh, called a telomere. The adjacencies and telomeres of a genome form a multiset, called the adjacency set.

We define three operations on an adjacency set. The corresponding operations on the structure of the genome (relative positions and orientations of genes on chromosomes) are illustrated on Figure 1.

thumbnailFigure 1. The effect of DCJ operations, insertions and deletions on the genomic structure. (a) (b) and (c) represent DCJ operations, (d) (e) (f) and (g) represent insertion and deletion. In each subfigure, the central part represents operations, and the left part and right part represent the genomic structures.

1. DCJ (double-cut-and-join) [2], which acts on one or two elements (adjacencies or telomeres) in one of the following ways: {pq, rs} → {pr, qs} or {ps, qr}(see Figure 1(a)); {pq, r} → {pr, q} or {p, qr}(see Figure 1(b)); {p, q} → {pq}or {pq} → {p, q}(see Figure 1(c)).

2. Insertion, which inserts a single gene (a pair of extremities) ghgt in one of the following ways: {pq} → {pgt, ghq} or {pgh, gtq} (see the upper arrow in Figure 1(d)); {p} → {pgt, gh} or {pgh, gt} (see the upper arrow in Figure 1(e)); ∅ → {gtgh} (see the upper arrow in Figure 1(f)); ∅ → {gt, gh} (see the upper arrow in Figure 1(g)).

3. Deletion, which deletes a single gene ghgt in one of the following ways: {pgt, ghq} → {pq} (see the lower arrow in Figure 1(d)); {pgt, gh} → {p} (see the lower arrow in Figure 1(e)); {gtgh} → ∅ (see the lower arrow in Figure 1(f)); {gt, gh} → ∅ (see the lower arrow in Figure 1(g)).

The edit distance between two adjacency sets S1 and S2, denoted as d(S1, S2), is the minimum number of operations (including DCJ operations, insertions and deletions) that transform S1 into S2. Here we use a unit-cost model, in which all operations have the same cost.

Note that the edit distance is defined at the adjacency set level. For genomes without duplicate genes, an adjacency set denotes a unique genomic structure. However, for genomes with duplicate genes, two genomes with different structures may share the same adjacency set as illustrated in Figure 2. Thus, d(S1, S2) defined above is a lower bound for the edit distance between the two genomic structures. Given two adjacency sets S1 and S2 from two genomes, let Ei be the multiset of extremities collected from all elements in Si, i = 1, 2. We pair extremities in E1\E2 into ghost adjacencies (named for the similar ghost genes of [7]) to yield the adjacency set T1; similarly, we produce T2 from E2\E1. Clearly, to transform S1 into S2, atleast |T1| deletions and |T2| insertions are needed. The following theorem shows that these insertions and deletions are both necessary and sufficient.

thumbnailFigure 2. Two genomes with different structures share the same adjacency set. Each edge in this figure represents a gene, each node represents an adjacency.

Theorem 1. Given two adjacency sets S1 and S2, there exists an optimal series of operations with exactly |T1| deletions, exactly |T2| insertions and some DCJ operations that transforms S1 into S2.

Proof. We prove this theorem by contradiction. Suppose that every optimal series of operations contains more than |T1| deletions and more than |T2| insertions. Assume that O1O2 ... Om is an optimal series of operations that contains a minimum number of insertions and deletions. Let S0S1S2 ... Sm be the trace of S1 in the process of transformation, where S0 = S1 and Sm = S2. Note that for any insertion (or deletion) beyond the |T1| deletions and |T2| insertions, there must be a matching deletion (or insertion) to preserve gene content. Thus every optimal series of operations has at least a pair of insertion and deletion on the same gene. Without loss of generality, assume Oi inserts a pair of extremities ghgt and Oj deletes ghgt (i <j), and operations between Oi and Oj do not contain insertion or deletion on ghgt. Now we will build a new series of operations <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M1">View MathML</a> without the pair of insertion and deletion on ghgt to replace Oi ... Oj, which produce the trace <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M2">View MathML</a> and satisfy <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M3">View MathML</a>. This process is shown in Figure 3. Denote the two extremities inserted in Oi as <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M6">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M7">View MathML</a> to distinguish them from other gh and gt. For k = i, ..., j -1, we will keep the invariant <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M4">View MathML</a>, where pk (qk) is the extremity that shares an adjacency with <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M5','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M5">View MathML</a>in Sk. Note that pk or qk might be empty if <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M6">View MathML</a>or <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M7">View MathML</a> forms a telomere, or <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M8','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M8">View MathML</a>forms an adjacency in Sk. Clearly this holds for k = i, since we have both <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M9','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M9">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M10','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M10">View MathML</a>. To make this invariant hold for k = i + 1, ..., j - 1, our new operation <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M11','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M11">View MathML</a> will mimic operation Ok as follows: if Ok does not affect the adjacencies or telomeres containing <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M6">View MathML</a> or <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M7">View MathML</a>, then set <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M12">View MathML</a>, and the invariant holds; if Ok acts on at least one of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M6">View MathML</a> or<a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M7">View MathML</a>, we will build <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M11','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M11">View MathML</a> from Ok by replacing <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M13','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M13">View MathML</a>with pk (qk) in Ok. For example, if Ok is the DCJ operation given by <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M14','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M14">View MathML</a>, then <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M11','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M11">View MathML</a> would be {pk-1qk-1, cd} → {p k-1c, qk-1d}.

thumbnailFigure 3. Building a new series of operations to replace <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M53','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M53">View MathML</a>. Oi will be skipped and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M54','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M54">View MathML</a> will mimic Ok + 1for k = i, i +1, ..., j -2. Finally, <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M17','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M17">View MathML</a> and Oj will be constructed according to Oj.

Since Ok does not affect, <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M7">View MathML</a> we have qk = qk-1. Besides, we have pk = d. Thus we have <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M15">View MathML</a>. Other types of operations can be expressed similarly.

Recall that Oj is a deletion, i.e., {agh, bgt} → {ab}. If gh and gt are the same as <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M6">View MathML</a> and, <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M7">View MathML</a> then we have <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M16','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M16">View MathML</a>, and we can skip <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M17','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M17">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M18','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M18">View MathML</a> in our constructed series. If gh and gt are different from <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M6">View MathML</a> and, <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M7">View MathML</a> then we have <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M19','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M19">View MathML</a>. We can set <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M17','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M17">View MathML</a> to be {agh, bgt} → {ab, ghgt}, and set <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M18','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M18">View MathML</a> to be {pj-1qj-1, ghgt} → {pj-1gh, qj-1gt}. We can verify <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M3">View MathML</a>, and our constructed series contradicts the optimality of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M20">View MathML</a>.

Adjacency graph decomposition

Given two adjacency sets S1 and S2 from two genomes, their corresponding adjacency graph is defined as a bipartite multigraph, A = {S1 T2, S2 T1, E},in which u S1 T2 and v S2 T1 are linked by one edge if u and v share one extremity, by two edges if they share two extremities. Note that S1 T2 and S2 T1 have the same set of extremities; we use n to denote half of the number of extremities. In the case of genomes with the same gene content and without duplicate genes, T1 = T2 = ∅, and each vertex in the adjacency graph has degree 2, which means that the adjacency graph consists of vertex-disjoint cycles and paths. We define the length of a cycle or a path to be the number of edges it contains. Based on Theorem 1, T1 = T2 = ∅ implies there exists an optimal solution without insertion and deletion, thus d(S1, S2) is just the minimum number of DCJ operations needed to transform S1 into S2. When S1 has been transformed into S2, the corresponding adjacency graph only consists of cycles of length 2 and paths of length 1. Since each DCJ operation can increase the number of cycles at most by 1, or increase the number of odd-length paths at most by 2, and we can always find out this kind of operation when S1 and S2 are different, we have d(S1, S2)= n - c -o/2, where c is the number of cycles and o is the number of odd-length paths in the adjacency graph [2].

In the presence of duplicate genes, the adjacency graph may contain vertices with degree larger than 2, so that there may be multiple ways of choosing vertex-disjoint cycles and paths that cover all vertices as illustrated in Figure 4. We say that a cycle (or path) in the adjacency graph is alternating if no two adjacent edges in this cycle (or path) share the same extremity. A valid decomposition of the adjacency graph is a set of vertex-disjoint alternating cycles and paths that cover all vertices. We say that a cycle of length ℓ is helpful if at most ℓ/2 - 1 vertices are ghost adjacencies, unhelpful if at least ℓ/2 vertices are ghost adjacencies. In fact, an unhelpful cycle has exactly ℓ/2 ghost adjacencies (all in T1 or all in T2), since adjacencies in T1 and adjacencies T2 do not have common extremities and thus cannot be linked in the adjacency graph. Now we show how to perform DCJ operations, insertions and deletions to transform S1 into S2 based on a decomposition of the corresponding adjacency graph.

thumbnailFigure 4. An example of adjacency graph with duplicate genes. (a) Structures of the two genomes. (b) Adjacency graph. (c) A decomposition with 2 cycles. (d) A decomposition with only 1 cycle. Diamonds and rectangles represent ghost adjacencies, and circles represent normal adjacencies.

Lemma 1. Given two adjacency sets S1 and S2, and a decomposition D of the adjacency graph A = {S1 T2, S2 T1, E} with c helpful cycles and o odd-length paths, we can perform n - c - o/2 operations to transform S1 into S2, among which there are |T1| deletions, |T2| insertions and n - c - o/2 - |T1|-|T2| DCJ operations.

Proof. We prove this lemma in a constructive way. We will perform operations under the guidance of the graph decomposition. The goal is to transform the adjacency graph into a collection of cycles of length 2 and paths of length 1 without ghost adjacencies, indicating that S1 has been transformed into S2. In the following, we will prove that an unhelpful cycle of length ℓ costs ℓ/2 operations, a path of even length ℓ costs ℓ/2 operations, a helpful cycle of length ℓ costs ℓ/2 -1 operations, and a path of odd length ℓ costs (ℓ - 1)/2 operations. In other words, a helpful cycle requires one less operation than an unhelpful cycle or an even-length path of the same length.

For a helpful cycle of length ℓ with d adjacencies in T1 and i adjacencies in T2, we first perform d deletions guided by this cycle to reduce the size of the cycle to ℓ - 2d. Then for each adjacency in T2, we choose one of its non-ghost neighbors in S1 and perform an insertion to create one more helpful cycle of length 2. After all adjacencies in T2 are handled, we transform the cycle of length ℓ into one of length ℓ - 2d - 2i without ghost adjacencies, on which finally we can perform ℓ/2 - d - i - 1 DCJ operations to create ℓ/2 - d - i cycles of length 2. An example is shown in Figure 5(a).

thumbnailFigure 5. Examples of performing operations under the guidance of decomposition. In each subfigure, the above part shows the transformation of the adjacency graph; the below part shows the corresponding change in the genomic structure.

For a unhelpful cycle of length ℓ with ℓ/2 adjacencies in T1, we can perform ℓ/2 deletions to remove the adjacencies in S1. For a unhelpful cycle of length ℓ with ℓ/2 adjacencies in T2, we can first insert a gene as initial operand, then perform ℓ/2 - 1 insertions to create ℓ/2 cycles of length 2--see Figure 5(b)(d).

For a path with odd length ℓ, we need (ℓ - 1)/2 operations, and for a path with even length ℓ, we need ℓ/2 operations--see Figure 5(c)(e).

In sum, there are |T1| deletions, |T2| insertions and n - c - o/2 - |T1| - |T2| DCJ operations.

Lemma 1 states that any decomposition of the adjacency graph gives an upper bound on the edit distance. The following lemma shows that an optimal decomposition also provides a lower bound.

Lemma 2. <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M21','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M21">View MathML</a>, where <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M22','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M22">View MathML</a>is the space of all decompositions of A = {S1 T2, S2 T1, E}, cD and oD is the number of helpful cycles and odd-length paths in D, respectively.

Proof. Let <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M23','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M23">View MathML</a>, where <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M24','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M24">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M25','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M25">View MathML</a> are the space of the decomposition before and after performing operation P, and P ∈ {DCJ, INS, DEL}. By Theorem 1, there exists an optimal series of operations with exactly |T1| deletions and |T2| insertions. Summing over all ΔP for these operations in this optimal solution yields <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M26','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M26">View MathML</a>, where (n - |T1|) is the sum of the number of helpful cycles and half of the number of odd-length paths in the optimal decomposition of the adjacency graph when S1 has been transformed into S2. Define δDCJ = 1, δINS = 1 and δDEL = 0. In the following, we will prove ΔP δP, P ∈ {DCJ, INS, DEL}, which implies that <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M27','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M27">View MathML</a>. The combination of these two formulas proves this lemma.

We prove ΔP δP by contradiction. Let A' and A" be the adjacency graphs before and after performing the operation P. Let σ(A') and σ(A") be the optimal decomposition of A' and A", respectively. Suppose ΔP >δP, namely, (cσ(A")+ oσ(A")/2) - (cσ(A′;)+ oσ(A')) >δP. Note that P is reversible; we denote the reversed operation as <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M28','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M28">View MathML</a>, and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M28','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M28">View MathML</a> simultaneously transforms σ(A") into a decomposition of A', denoted γ(A'). Since σ(A') is optimal, we have cσ(A')+ oσ(A')/2 ≥ cγ(A')+ oγ(A')/2. Thus, to get the contradiction, we only need to prove (cσ(A")+ oσ(A")/2) - (cγ(A')+ oγ(A')/2) ≤ δP. Recall that γ(A') is obtained from σ(A") by performing operation <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M28','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M28">View MathML</a>, and both σ(A") and γ(A') are decompositions, which includes only vertex-disjoint cycles and paths.

If P is a DCJ operation, then <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M28','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M28">View MathML</a> is still a DCJ operation. A DCJ operation may merge two cycles into one cycle, split one cycle into two cycles, merge two paths into one path, split one path into two paths, merge one path and one cycle into one path, split one path into one cycle and one path, rearrange two odd(even)-length paths into two even(odd) paths or make no change in the number of cycles and odd-length paths. Among those possible operations, the following four cases can reduce the number of helpful cycles or odd-length paths: (i) merge two helpful cycles into one helpful cycle; (ii) merge two odd-length paths into one even-length path; (iii) rearrange two odd-length paths into two even-length paths; (iv) merge one helpful cycle and one odd-length path into one odd-length path. For any of these four cases, we have (cσ(A")+ oσ(A")/2) - (cγ(A')+ oγ(A')/2) = 1. For other possible DCJ operations, we have (cσ(A")+ oσ(A")/2) - (cγ(A')+ oγ(A')/2) ≤ 0.

If P is an insertion, then <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M28','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M28">View MathML</a> is a deletion. Similarly, among all possible deletions, the following five cases can reduce the number of helpful cycles or odd-length paths: (i) merge two helpful cycles into one helpful cycle; (ii) merge two odd-length paths into one even-length path; (iii) rearrange two odd-length paths into two even-length paths; (iv) merge one helpful cycle and one odd-length path into one odd-length path; (v) change a helpful cycle into an unhelpful one. For any of these five cases, we have (cσ(A")+ oσ(A")/2) - (cγ(A')+ oγ(A')/2) = 1. For other possible deletions, we have (cσ(A")+ oσ(A")/2) - (cγ(A')+ oγ(A')/2) ≤ 0.

If P is a deletion, then <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M28','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M28">View MathML</a> is an insertion. A insertion may split one cycle into two cycles, split one path into two paths, or split one path into one cycle and one path. All these possible insertions will not reduce the number of helpful cycles or odd-length paths. Thus, any deletion will not increase the number of helpful cycles or the number of odd-length paths, and we have cσ(A")+ oσ(A")/2 ≤ cγ(A')+ oγ(A')/2.   □

Combining Lemma 1 and Lemma 2, we have the following theorem.

Theorem 2. <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M29">View MathML</a>, where <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M22','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M22">View MathML</a> is the space of all decompositions of A = {S1 T2, S2 T1, E}, cD and oD are the numbers of helpful cycles and odd-length paths in D, respectively.

Approximation algorithm

We design an approximation algorithm by using techniques employed on the problem of BREAKPOINT GRAPH DECOMPOSITION[5,6,18-20]. The basic idea is to find the maximum number of vertex-disjoint helpful cycles of length 4 in the adjacency graph. This problem can be reduced to the problem of K-SET PACKING problem with k = 4, for which the best-to-date algorithm has an approximation ratio of 2 + ∈ [21,22].

To make use of such algorithm, we must remove telomeres and keep only cycles in the adjacency graph. This can be done by introducing null extremities τ and null adjacencies ττ, which are different from other extremities and adjacencies (the same definition is introduced in [7]). Given two adjacency sets S1 and S2 with 2k1 and 2k2 telomeres respectively, we replace each telomere x by the adjacency . If we additionally have k1 <k2, we must add (k2 - k1) null adjacencies ττ to S1 in order to balance the degrees. The corresponding adjacency graph is constructed in the same way as the case without null extremities: two adjacencies are linked by one edge if they share one extremity, by two edges if they share two extremities. Now we prove that this "telomere-removal" operation does not change d(S1, S2).

Theorem 3. Let S1 and S2 be two adjacency sets and denote by <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M30','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M30">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M31','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M31">View MathML</a>the adjacency sets obtained from S1 and S2 by removing telomeres. Then we can write <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M32','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M32">View MathML</a>.

Proof. We first prove <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M33','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M33">View MathML</a>. Let A = {S1 T2, S2 T1, E} be the adjacency graph with respect to S1 and S2 and σ(A) be the optimal decomposition of A. Let <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M34','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M34">View MathML</a> be the adjacency graph with respect to <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M35','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M35">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M36','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M36">View MathML</a> and σ(A') be the optimal decomposition of A'. Suppose σ(A) contains c helpful cycles, o odd-length paths and e even-length paths, and among these e even-length paths, e1 of them contain two telomeres in S1 and e2 of them contain two telomeres in S2. Suppose S1 and S2 contains 2k1 and 2k2 telomeres respectively (w.l.o.g., assume k1 k2). Since an odd-length path contains one telomere in each adjacency set while an even-length path contains two telomeres in one adjacency set, we have o + 2e1 = 2k1 and o + 2e2 = 2k2. We can perform the following modifications on σ(A) to transform it into a decomposition of A' (see Figure 6). Nothing needs to be done for cycles. For odd-length paths, link their two telomeres to form a helpful cycle; for each even-length path with both telomeres in S1, arbitrarily choose one even-length path with both telomeres in S2 and link these two paths to form a helpful cycle; for the remaining e2 - e1 even-length paths, use e2 - e1 = k2 - k1 null adjacencies ττ to transform each such path into a helpful cycle. Thus, there are c + e2 helpful cycles in this decomposition of A', so that the upper bound on <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M37','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M37">View MathML</a> is (n + k2) - c- e2 = n - c - o/2 = d(S1, S2). Now we prove <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M38','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M38">View MathML</a>. Note that σ(A') only consists of vertex-disjoint cycles, and unhelpful cycles cannot contain any null extremity. We claim that, for each helpful cycle in σ(A'), there must be no more than two null extremities τ on each side. Otherwise, we can always choose two nonadjacent edges that are linked through τ, exchange four ends of them, and divide this cycle into two cycles (see Figure 7), contradicting the optimality of σ(A'). Now we transform σ(A') into a decomposition of A by recovering all removed telomeres (see Figure 6). Each cycle falls into one of three cases: (a) it contains one adjacency on each side, then the recovery will yield one odd-length path; (b) it contains one ττ adjacency on one side, then the recovery will yield one even-length path; (c) it contains two -like adjacencies on each side, then the recovery will yield two even-length paths. In all three cases the value n - c - o/2 remains unchanged, and after the recovery we obtain a decomposition of A. Thus we have <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M38','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M38">View MathML</a>.   □

thumbnailFigure 6. One example of the "telomere-removal" and "telomere-recovery" process. Thick circles represent adjacencies containing null extremities, and thick lines represent edges connecting null extremities.

thumbnailFigure 7. Two cases of the adjacency graph with more than 2 edges that are linked through τ. Dashed lines might represent more than one edge.

In summary, based on Theorems 2 and 3, we have stated the equivalence of the problem of computing the edit distance and that of finding a valid decomposition with a maximum number of helpful cycles in an adjacency graph without telomeres. The latter problem is NP-hard by a reduction from the NP-hard problem--BREAKPOINT GRAPH DECOMPOSITION[23], since any instance of the BREAKPOINT GRAPH DECOMPOSITION is indeed an adjacency graph without ghost adjacencies. Thus, the problem of computing the edit distance is also NP-hard.

Now we give the approximation algorithm and prove that its approximation ratio is 1.5 + .

Approximation Algorithm

Input: Two adjacency sets S1 and S2 from two genomes

Output: A series of operations to transform S1 into S2.

Step 1 Add null adjacencies to S1 and S2 to obtain <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M39','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M39">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M40','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M40">View MathML</a> without telomeres. Build the adjacency graph <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M34','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M34">View MathML</a>.

Step 2 Collect all helpful cycles of length 4 in A' as <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M41','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M41">View MathML</a>. Find a subset <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M42','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M42">View MathML</a> of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M43','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M43">View MathML</a> in which no two cycles share one adjacency using the (2 + ε)-approximation algorithm for the K-SET PACKING problem with k = 4.

Step 3 Remove the adjacencies covered by cycles in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M50','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M50">View MathML</a>. Arbitrarily decompose the remaining part of A' into cycles, denoting this set as <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M44','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M44">View MathML</a>.

Step 4 Remove the null adjacencies of cycles in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M45','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M45">View MathML</a> to obtain a decomposition of A. Transform S1 into S2 according to Lemma 1 guided by these cycles and paths.

The running time of the above algorithm is dominated by the time complexity of the (2 + ε)-approximation algorithm for the K-SET PACKING problem with k = 4, which is <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M46','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M46">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M47','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M47">View MathML</a>[21,22].

Theorem 4. The approximation ratio of the above algorithm is 1.5 + ε.

Proof. Suppose the optimal decomposition of A' contain p helpful cycles of length 4 and q longer helpful cycles. Clearly, we have n ≥ 2p +3q. Based on Theorem 2 and Theorem 3, we know that d(S1, S2) = n - p - q. In the algorithm, we find at least <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M48','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M48">View MathML</a>helpful cycles, which implies that the number of operations that our algorithm outputs is at most <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M49','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M49">View MathML</a>. Since <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M50','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M50">View MathML</a> is a (2 + )-approximation solution, we have <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M51','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M51">View MathML</a>, where OPT is the maximum number of independent helpful cycles of length 4 in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M43','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M43">View MathML</a>. The approximation ratio is thus

<a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M52','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S13/mathml/M52">View MathML</a>

Conclusion

We studied the edit distance problem for two genomes under a unit-cost model including DCJ operations, insertions (including duplications) and deletions. We proved that this problem is equivalent to finding maximum number of helpful cycles in the adjacency graph and gave a (1.5 + )-approximation algorithm. We made two main assumptions in this work: single-gene insertions and deletions; and unit cost for DCJ operations, insertions and deletions. Both are clearly unrealistic. For example, large segmental duplications are common in many mammalian genomes [24], paracentric rearrangements are more common than pericentric ones, at least in two Drosophila species [25], and short inversions are more common than long ones, in some prokaryotes and in the aforementioned Drosophila [26]. These constraints should be incorporated into our distance computation. Any additional constraint naturally creates complications, but we expect that at least a few natural constraints can be handled within the framework described here.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

MS and YL conceived the idea, performed the analysis, and wrote the manuscript. All authors read and approved the final manuscript.

Acknowledgements

We thank Bernard Moret for helpful discussions.

This article has been published as part of BMC Bioinformatics Volume 13 Supplement 19, 2012: Proceedings of the Tenth Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/13/S19

References

  1. Fertin G, Labarre A, Rusu I, Tannier E, Vialette S: Combinatorics of Genome Rearrangements. MIT Press; 2009. OpenURL

  2. Bergeron A, Mixtacki J, Stoye J: A unifying view of genome rearrangements. In Proc 6th Workshop Algs in Bioinf (WABI'06), Volume 4175 of Lecture Notes in Comp Sci. Springer Verlag, Berlin; 2006:163-173. OpenURL

  3. Yancopoulos S, Attie O, Friedberg R: Efficient sorting of genomic permutations by translocation, inversion and block interchange.

    Bioinformatics 2005, 21(16):3340-3346. PubMed Abstract | Publisher Full Text OpenURL

  4. Bergeron A, Mixtacki J, Stoye J: A new linear-time algorithm to compute the genomic distance via the double cut and join distance.

    Theor Comput Sci 2009, 410(51):5300-5316. Publisher Full Text OpenURL

  5. Chen X: On sorting permutations by double-cut-and-joins. In Proc 16th Conf Computing and Combinatorics (COCOON'10), Volume 6196 of Lecture Notes in Comp Sci. Springer Verlag, Berlin; 2010:439-448. OpenURL

  6. Chen X, Sun R, Yu J: Approximating the double-cut-and-join distance between unsigned genomes.

    BMC Bioinformatics 2011, 12(Suppl 9):S17. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  7. Yancopoulos S, Friedberg R: Sorting genomes with insertions, deletions and duplications by DCJ.

    recombcg08 2008, 170-183. OpenURL

  8. Hannenhalli S, Pevzner P: Transforming cabbage into turnip (polynomial algorithm for sorting signed permutations by reversals). In Proc 27th Ann ACM Symp Theory of Comput (STOC'95). ACM Press, New York; 1995:178-189. OpenURL

  9. Bader D, Moret B, Yan M: A fast linear-time algorithm for inversion distance with an experimental comparison.

    J Comput Biol 2001, 8(5):483-491. PubMed Abstract | Publisher Full Text OpenURL

  10. Jean G, Nikolski M: Genome rearrangements: a correct algorithm for optimal capping.

    Inf Proc Letters 2007, 104:14-20. Publisher Full Text OpenURL

  11. Ozery-Flato M, Shamir R: Two notes on genome rearrangement.

    J Bioinf Comp Bio 2003, 1:71-94. PubMed Abstract | Publisher Full Text OpenURL

  12. Tesler G: Efficient algorithms for multichromosomal genome rearrangements.

    J Comput Syst Sci 2002, 65(3):587-609. Publisher Full Text OpenURL

  13. El-Mabrouk N: Sorting signed permutations by reversals and insertions/deletions of contiguous segments.

    Journal of Discrete Algorithms 2001, 1:105-122. OpenURL

  14. Chen X, Zheng J, Fu Z, Nan P, Zhong Y, Lonardi S, Jiang T: Assignment of orthologous genes via genome rearrangement.

    ACM/IEEE Trans on Comput Bio & Bioinf 2005, 2(4):302-315. PubMed Abstract | Publisher Full Text OpenURL

  15. Braga M, Willing E, Stoye J: Genomic distance with DCJ and indels.

    Algorithms in Bioinformatics 2010, 90-101. OpenURL

  16. Angibaud S, Fertin G, Rusu I, Vialette S: A pseudo-boolean framework for computing rearrangement distances between genomes with duplicates.

    jcb 2007, 14(4):379-393. PubMed Abstract | Publisher Full Text OpenURL

  17. Angibaud S, Fertin G, Rusu I, Thévenin A, Vialette S, et al.: On the approximability of comparing genomes with duplicates.

    Journal of Graph Algorithms and Applications 2009, 13:19-53. Publisher Full Text OpenURL

  18. Caprara A, Rizzi R: Improved approximation for breakpoint graph decomposition and sorting by reversals.

    J of Combin Optimization 2002, 6(2):157-182. Publisher Full Text OpenURL

  19. Christie D: A 3/2-approximation algorithm for sorting by reversals. In Proc 9th Ann ACM/SIAM Symp Discrete Algs (SODA'98). SIAM Press, Philadelphia; 1998:244-252. OpenURL

  20. Lin G, Jiang T: A further improved approximation algorithm for breakpoint graph decomposition.

    J of Combin Optimization 2004, 8(2):183-194. OpenURL

  21. Halldórsson M: Approximating discrete collections via local improvements.

    Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms, Society for Industrial and Applied Mathematics 1995, 160-169. OpenURL

  22. Hurkens C, Schrijver A: On the size of systems of sets every t of which have an SDR, with an application to the worst-case ratio of heuristics for packing problems.

    SIAM Journal on Discrete Mathematics 1989, 2:68-72. Publisher Full Text OpenURL

  23. Kececioglu J, Sankoff D: Exact and approximation algorithms for sorting by reversals, with application to genome rearrangement.

    Algorithmica 1995, 13:180-210. Publisher Full Text OpenURL

  24. Bailey J, Eichler E: Primate segmental duplications: crucibles of evolution, diversity and disease.

    Nature Reviews Genetics 2006, 7(7):552-564. PubMed Abstract | Publisher Full Text OpenURL

  25. York T, Durrett R, Nielsen R: Dependence of paracentric inversion rate on tract length.

    BMC Bioinformatics 2007., 8(115) PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  26. Lefebvre JF, El-Mabrouk N, Tillier E, Sankoff D: Detection and validation of single gene inversions. In Proc 11th Int'l Conf on Intelligent Systems for Mol Biol (ISMB'03), Volume 19 of Bioinformatics. Oxford U Press; 2003:i190-i196. PubMed Abstract | Publisher Full Text OpenURL