Algorithmic Biology Laboratory, St. Petersburg Academic University, St. Petersburg, Russia

Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, USA

Abstract

In comparative genomics, the rearrangement distance between two genomes (equal the minimal number of genome rearrangements required to transform them into a single genome) is often used for measuring their evolutionary remoteness. Generalization of this measure to three genomes is known as the

In this work, we study relationship and interplay of pairwise distances between three genomes and their median score under the model of Double-Cut-and-Join (DCJ) rearrangements. Most remarkably we show that while a rearrangement may change the sum of pairwise distances by at most 2 (and thus change the lower bound by at most 1), even the most "powerful" rearrangements in this respect that increase the lower bound by 1 (by moving one genome farther away from each of the other two genomes), which we call

We further prove that the median score attains the lower bound exactly on the triples of genomes that can be obtained from a single genome with strong rearrangements. While the sum of pairwise distances with the factor 2/3 represents an upper bound for the median score, its tightness remains unclear. Nonetheless, we show that the difference of the median score and its lower bound is not bounded by a constant.

Background

The number of large-scale rearrangements (such as reversals, translocations, fissions, and fusions) between two genomes is often used as a measure of their evolutionary remoteness. The minimal number of such rearrangements required to transform one genome into the other is called

Phylogeny reconstruction for three given genomes involves reconstruction of their

The simplest and easily computable approximation for the median score of three genomes is given by the sum of their pairwise DCJ distances, which we call the

While tightness of the upper bound remains unclear, we remark that a better upper bound for the median score may improve performance of algorithms for computing median score based on the adequate subgraph decomposition

Methods

Breakpoint graphs and genome rearrangements

In this work, we focus on circular genomes consisting of one or more circular chromosomes. A circular genome on a set of _{1}, _{2},..., _{k}_{1},..., _{k}_{i }_{i}_{1}, _{2},..., _{k }

A _{dcj}_{dcj}_{dcj}_{dcj }_{dcj}

Let

Breakpoint graph BG(^{t }^{h }^{t }^{h }^{t}^{t }^{h }^{h }^{t }^{t}^{t }^{h }^{t }^{h }^{t}

**Breakpoint graph BG( A, B, C) of genomes A = (1)(2) (red edges), B = (1, 2) (blue edges), and C = (1, -2) (green edges), where (1**.

Since a DCJ in one of the genomes can change each of the two corresponding distances by at most 1, it can change ts(

**Lemma 1**. _{dcj}_{dcj}(A, C) are less than n -

_{dcj}_{A }_{A }_{1 }∪ _{2 }∪...

Similarly, since d_{dcj}_{A }_{A }_{1 }∪ _{2 }∪.... Intersecting the subsets in the two partitions, we get a partition of _{A }_{A }_{i, j}_{i }_{j}

Suppose that there is no required pair of _{i }_{j }_{i'}_{j'}_{1 }∩ _{1 }is non-empty. Then for every _{i }_{j }_{i }_{1}. In particular, _{2 }∩ _{1 }= _{2 }is non-empty and by the same reasoning, we have _{1 }⊂ _{1}. Therefore, _{i }⊂ _{1 }for all _{1 }= _{A}_{2}. This contradiction proves that a required pair of

**Theorem 2**.

_{dcj}_{dcj}

** A-edges (red) that belong to distinct AB-cycles and distinct AC-cycles, denoted by dashed blue lines and dashed green lines, respectively (left panel)**. A DCJ on these

**Theorem 3**.

_{dcj}_{dcj}_{dcj }

If

where by the triangle inequality all coefficients are nonnegative. This identity instructs us to apply _{dcj}_{dcj}

If ^{t}^{h}^{t}^{h }

While all triples of pairwise DCJ distances are achievable with strong DCJs, not all breakpoint graphs of three genomes can be constructed from an identity breakpoint graph this way. In particular, Figure

Breakpoint graph BG(

**Breakpoint graph BG( A, B, C) of genomes A = (1)(2)(3)(4) (red edges), B = (1, 2)(3, 4) (blue edges), and C = (1, -2)(3, -4) (green edges) with the property that no DCJ can decrease ts(A, B, C) by 2**.

In the next section we demonstrate how DCJs on three genomes can affect their median score.

Strong rearrangements and median score

The median problem can be alternatively posed as finding the minimal number (equal ms(

From perspective of this formulation, it becomes important to realize what triples of genomes can be obtained from a single genome with strong DCJs. We start with proving a helpful lemma and bounds on the median score in terms of the triangle score.

**Lemma 4**.

Clearly, d_{dcj}_{dcj}

i.e., ms(

**Theorem 5**.

On the other hand, the number of DCJs in any transformation of the genomes _{dcj}_{dcj}_{dcj}_{dcj}_{dcj}_{dcj}

We remark that the lower bound and a slightly better upper bound ms(_{dcj}_{dcj}_{dcj}_{dcj}_{dcj}_{dcj}

The following theorem classifies all triples of genomes for which the median score coincides with its lower bound and links them with the genomes constructed in Theorem 3.

**Theorem 6**.

Vice versa, suppose that

It remains unclear how tight is the upper bound given in Theorem 5, while a better upper bound may improve performance of algorithms for computing median score based on the adequate subgraph decomposition

**Theorem 7**. The

_{n}, B_{n}, C_{n }_{n}, B_{n}, C_{n}_{n}, B_{n}, C_{n}

We start with genomes _{1 }= (1) (2) (3) (4), _{1 }= (1, 2) (3, 4), and _{1 }= (1, -2) (3, -4). The breakpoint graph BG(_{1}, _{1}, _{1}) consists of two strongly adequate subgraphs (Figure _{1}, _{1}, _{1})= 4 and ts(_{1}, _{1}, _{1}) = 6, resulting in ms(_{1}, _{1}, _{1}) - 1/2 · ts(_{1}, _{1}, _{1}) = 1.

To construct BG(_{n}, B_{n}, C_{n}_{1}, _{1}, _{1}) and relabel their vertices appropriately. In particular, for _{2 }= (1) (2) (3) (4) (5) (6) (7) (8), _{2 }= (1, 2) (3, 4) (5, 6) (7, 8), and _{2 }= (1, -2) (3, -4) (5, -6) (7, -8). Since edges of a median genome do not connect strongly adequate subgraphs of the breakpoint graph _{1}, _{1}, _{1}) in BG(_{n}, B_{n}, C_{n}_{1}, _{1}, _{1}) contributes 6 to the triangle score, implying that ms(_{n}, B_{n}, C_{n}_{n}, B_{n}, C_{n}

We conclude our analysis with the last but not the least observation about the lower bound 1/2 · ts(

**Left panel: **Breakpoint graph of genomes

** Left panel: Breakpoint graph of genomes A = (1, -6, -7, -8, -9, -10, -11)(2, 5, 4, 3) (red edges), B = (1, 8, 9, 10, 11)(2, 3, 4, 5, 6, 7) (blue edges), C = (1, -3, 4, 10, -8, 11, 9, 5, -7, 6, 2) (green edges), and their median genome M = (1, -6, -5, -2, -3, -4, -7, -10, -11, 8, 9) (dashed edges) with ts(A, B, C) = 24 and ms(A, B, C) = 15**. The pairwise DCJ distances are d

Results and discussion

We studied two measures of evolutionary remoteness of three genomes

In view of the median genome problem as finding a transformation of the given genomes into a single genome (or a reverse transformation of a single genome into the given genomes) with the smallest number of genome rearrangements, it is important to understand how rearrangements can change the median score and its bounds. When

We showed that the median score attains its lower bound (i.e., ms(

It remains unclear how tight is the upper bound for the median score, while a better upper bound may improve performance of existing algorithms for computing the median score. Nonetheless, we made an initial step in this direction by proving that there is no upper bound equal the lower bound plus a constant (Theorem 7).

Competing interests

The authors declare that they have no competing interests.

Acknowledgements

This work was partially supported by the Government of the Russian Federation (grant 11.G34.31.0018).

This article has been published as part of