### Abstract

The techniques of next generation sequencing allow an increasing number of draft genomes
to be produced rapidly in a decreasing cost. However, these draft genomes usually
are just partially sequenced as collections of unassembled contigs, which cannot be
used directly by currently existing algorithms for studying their genome rearrangements
and phylogeny reconstruction. In this work, we study the one-sided block (or contig)
ordering problem with weighted reversal and block-interchange distance. Given a partially
assembled genome *π *and a completely assembled genome *σ*, the problem is to find an optimal ordering to assemble (i.e., order and orient)
the contigs of *π *such that the rearrangement distance measured by reversals and block-interchanges
(also called generalized transpositions) with the weight ratio 1:2 between the assembled
contigs of *π *and *σ *is minimized. In addition to genome rearrangements and phylogeny reconstruction, the
one-sided block ordering problem particularly has a useful application in genome resequencing,
because its algorithms can be used to assemble the contigs of a draft genome *π *based on a reference genome *σ*. By using permutation groups, we design an efficient algorithm to solve this one-sided
block ordering problem in
*n *is the number of genes or markers and *δ *is the number of used reversals and block-interchanges. We also show that the assembly
of the partially assembled genome can be done in
*π *and *σ *is increased. In particular, if there are more transpositions involved in the rearrangement
events, then the gap of accuracy performance between our program and SIS is increasing.

### Background

The techniques of next generation sequencing have greatly advanced in the past decade
[1-3], which allows an increasing number of draft genomes to be produced rapidly in a decreasing
cost. Usually, these draft genomes are partially sequenced, leading to their published
genomes as collections of unassembled contigs (short for contiguous fragments). These
draft genomes in contig form, however, can not be used immediately in some bioinformatics
applications, such as the study of genome rearrangements, which requires the completely
assembled genomes to calculate their rearrangement distances [4]. To adequately address this issue, Gaul and Blanchette [5] introduced and studied the so-called block ordering problem defined as follows. Given
two partially assembled genomes, with each representing as an unordered set of blocks,
the *block ordering problem *is to assemble (i.e., order and orient) the blocks of the two genomes such that the
distance of genome rearrangements between the two assembled genomes is minimized.
The blocks mentioned above are the contigs, each of which can be represented by an
ordered list of genes or markers. In their work [5], Gaul and Blanchette proposed a linear-time algorithm to solve the block ordering
problem if the problem is further simplified to maximize the number of cycles in the
breakpoint graph corresponding to the assembled genomes. The rationale behind this
modification is totally based on a result obtained by Bourque and Pevzner [6], showing that the reversal distance between two assembled genomes can be approximated
well by maximizing the number of cycles in their corresponding breakpoint graph. Actually,
in addition to the number of cycles, the number of hurdles, as well as the presence
of a fortress or not, is also important and needed for determining the actual reversal
distance [7]. Therefore, it is still a challenge to efficiently solve the block ordering problem
by optimizing the true rearrangement distance.

In the literature, many different kinds of genome rearrangements have been extensively
studied [4], such as reversal (also called inversion), transposition and block-interchange (also
called generalized transposition), translocation, fusion and fission. Reversal affects
a segment on a chromosome by reversing this segment as well as exchanging its strands.
Transposition rearranges a chromosome by interchanging its two adjacent and nonoverlapping
segments. Block-interchange is a generalized transposition that exchanges two nonoverlapping
but not necessarily adjacent segments on a chromosome. Translocation acts on two chromosomes
by exchanging their the end fragments. Fusion is a special translocation that joins
two chromosomes into one and fission is also a special translocation that splits a
chromosome into two. In this study, we consider a variant of the block ordering problem,
in which one of the two input genomes is still partially assembled but the other is
completely assembled, with optimizing the genome rearrangement distance measured by
weighted reversals and block-interchanges, whose weights are 1 and 2, respectively.
For distinguishing this special block ordering problem from the original one, we call
it as *one-sided block *(*or contig*) *ordering problem*. In fact, an efficient algorithm to solve the one-sided block ordering problem has
a useful application in genome resequencing [8,9], because the reference genome for resequencing organisms can serve as the completely
assembled genome in the one-sided block ordering problem and the contigs of partially
assembled resequencing genome can then be assembled together into one or several scaffolds
based on the reference genome. From this respect, the one-sided block ordering problem
can be considered as a kind of *contig scaffolding *(*or assembly*) *problem *that aims to use genome rearrangements to create contig scaffolds for a draft genome
based on a reference genome.

Currently, several contig scaffolding tools based on the reference genomes have been
developed, such as Projector 2 [10], OSLay [11], ABACAS [12], Mauve Aligner [13], fillScaffolds [14], r2cat [15] and SIS [16]. Among these contig scaffolding tools, both SIS and fillScaffolds use the concept
of genome rearrangements to generate contig scaffolds for a draft genome. SIS deals
with only reversals, while in addition to reversals, fillScaffolds considers other
rearrangements, such as transpositions and translocations (including fissions and
fusions). Basically, SIS was dedicated to creating the contig scaffolds for prokaryotic
draft genomes by heuristically searching for their inversion signatures, where an
*inversion signature *is a pair of adjacent genes or markers appearing along a contig such that they form
a breakpoint and are also located in different transcriptional strands. As for fillScaffolds,
it used the traditional technique of breakpoint graphs to assemble the contigs of
draft genomes. In the study by Dias and colleagues [16], they have used real prokaryotic draft genomes to demonstrate that SIS had the best
overall accuracy performance when compared to the other tools we mentioned above.

In this study, we utilize permutation groups in algebra, instead of the breakpoint
graphs used by Gaul and Blanchette [5], to design an efficient algorithm, whose time complexity is
*n *is the number of genes or markers and *δ *is the number of reversals and block-interchanges used to transform the assembly of
the partially assembled genome (i.e., draft genome) into the completely assembled
genome (i.e., reference genome). In particular, we also show that the assembly of
the partially assembled genome can be done in

### Preliminaries

#### One-sided block ordering problem

In the following, we dedicate ourselves to linear, uni-chromosomal genomes. With a
slight modification, however, our algorithmic result can still apply to circular,
uni-chromosomal genomes, or to multi-chromosomal genomes with linear or circular chromosomes
in a chromosome-by-chromosome manner. Once completely assembled, a uni-chromosomal
genome can be represented by a signed permutation of *n *integers between 1 and *n*, with each integer representing a gene or marker and its associated sign indicating
the strandedness of the corresponding gene or marker. If the genome is partially assembled,
then it will be represented by an unordered set of blocks, where a block *B *of size *k*, denoted by *B *= [*b*_{1}, *b*_{2}, ..., *b _{k}*], is an ordered list of

*k*signed integers. Let

*reverse*of

*B*. Given an unordered set of

*m*blocks, say

*ordering*(or

*assembly*) of

*m*blocks in which each block

*B*or its reverse

_{i }*i*≤

*m*. For instance, suppose that

*B*

_{1},

*B*

_{3},

*B*

_{2}) = ([1, 4], [-5, 6], [3, 2]) and (

*B*

_{1}, -

*B*

_{3},

*B*

_{2}) = ([1, 4], [-6, 5], [3, 2]) are two orderings of

*induces*(or

*defines*) a signed permutation of size

*n*, which is obtained by concatenating the blocks in this ordered list. For instance, the ordering (

*B*

_{1},

*B*

_{3},

*B*

_{2}) in the above exemplified

*B*

_{1 }⊙

*B*

_{3 }⊙

*B*

_{2}. Clearly, the permutation induced by an ordering of

#### One-sided block ordering problem with reversal and block-interchange distance

**Input: **A partially assembled genome *π *and a completely assembled genome *σ*.

**Output: **Find an ordering of *π *such that the rearrangement distance measured by reversals and block-interchanges
with the weight ratio 1:2 between the permutation induced by the ordering of *π *and *σ *is minimized.

As discussed in our previous study [17], it is biologically meaningful to assign twice the weight to block-interchanges than to reversals, due to the observation from the biological data that transpositions occur with about half the frequency of reversals [18].

#### Permutation groups

Permutation groups have been proven to be a very useful tool in the studies of genome
rearrangements [17]. Below, we recall some useful definitions, notations and properties borrowed form
our previous work [17]. Basically, given a set *E *= {1, 2, ..., *n*}, a *permutation *is defined to be a one-to-one function from *E *into itself and usually expressed as a product of cycles in the study of genome rearrangements.
For instance, *π *= (1)(3, 2) is a product of two cycles to represent a permutation of *E *= {1, 2, 3} and means that *π*(1) = 1*, π*(2) = 3 and *π*(3) = 2. The elements in a cycle can be arranged in any cyclic order and hence the
cycle (3, 2) in the permutation *π *exemplified above can be rewritten as (2, 3). Moreover, if the cycles in a permutation
are all disjoint (i.e., no common element in any two cycles), then the product of
these cycles is called the *cycle decomposition *of the permutation. In fact, a permutation in the cycle decomposition can be used
to model a genome containing several circular chromosomes, with each disjoint cycle
representing a circular chromosome. Notice that in the rest of this article, we say
"cycle in a permutation" to mean "cycle in the cycle decomposition of this permutation"
for simplicity, unless otherwise specified. A cycle with *k *elements is further called a *k*-*cycle*. In convention, the 1-cycles in a permutation are not written explicitly since their
elements are *fixed *in the permutation. For instance, the above exemplified permutation *π *can be written as *π *= (2, 3). If the cycles in a permutation are all 1-cycles, then this permutation is
called an *identify permutation *and denoted by **1**. Suppose that *α *and *β *are two permutations of *E*. Then their product *αβ*, also called their *composition*, defines a permutation of *E *satisfying *αβ*(*x*) = *α*(*β*(*x*)) for all
*β *are disjoint, then *αβ *= *βα*. If *αβ *= **1**, then *α *is called the *inverse *of *β*, denoted by *β*^{-1}, and vice versa. Moreover, the *conjugation *of *β *by *α*, denoted by *α *· *β*, is defined to be the permutation
*y *= *β*(*x*), then *α*(*y*) = *αβ*(*x*) = *αβα*^{-1}*α*(*x*) = *α *· *β*(*α*(*x*)). Hence, *α *· *β *can be obtained from *β *by just changing its element *x *with *α*(*x*). In other words, if *β *= (*b*_{1}, *b*_{2}, ..., *b _{k}*), then

*α*·

*β*= (

*α*(

*b*

_{1}),

*α*(

*b*

_{2}), ...,

*α*(

*b*)).

_{k}It is a fact that every permutation can be expressed into a product of 2-cycles, in
which 1-cycles are still written implicitly. Given a permutation *α *of *E*, its *norm*, denoted by ||*α*||, is defined to be the minimum number, say *k*, such that *α *can be expressed as a product of *k *2-cycles. In the cycle decomposition of *α*, let *n _{c}*(

*α*) denote the number of its disjoint cycles, notably including the 1-cycles not written explicitly. Given two permutations

*α*and

*β*of

*E*,

*α*is said to

*divide β*, denoted by

*α|β*, if and only if ||

*βα*

^{-1}|| = ||

*β*||

*-*||

*α*||. In our previous work [17], it has been shown that ||

*α*|| =

*|E| - n*(

_{c}*α*) and for any

*k*elements in

*E*, say

*a*

_{1},

*a*

_{2}, ...,

*a*, they all appear in a cycle of

_{k}*α*in the ordering of

*a*

_{1},

*a*

_{2}, ...,

*a*if and only if (

_{k }*a*

_{1},

*a*

_{2}, ...,

*a*)

_{k}*| α*.

Let *α *= (*a*_{1}*, a*_{2}) be a 2-cycle and *β *be an arbitrary permutation of *E*. If *α|β*, that is, both *a*_{1 }and *a*_{2 }appear in the same cycle of *β*, then the composition *αβ*, as well as *βα*, has the effect of fission by breaking this cycle into two smaller cycles. For instance,
let *α *= (1, 3) and *β *= (1, 2, 3, 4). Then *α|β*, since both 1 and 3 are in the cycle (1, 2, 3, 4), and *αβ *= (1, 2)(3, 4) and *βα *= (4, 1)(2, 3). On the other hand, if
*a*_{1 }and *a*_{2 }appear in different cycles of *β*, then *αβ*, as well as *βα*, has the effect of fusion by joining the two cycles into a bigger cycle. For example,
if *α *= (1, 3) and *β *= (1, 2)(3, 4), then
*αβ *= (1, 2, 3, 4) and *βα *= (2, 1, 4, 3).

#### A model for representing DNA molecules

As mentioned before, a permutation in the form of the cycle decomposition can be used
to model a genome containing multiple chromosomes (or a chromosome with multiple contigs),
with each cycle representing a chromosome (or contig). To facilitate modelling the
rearrangement of reversals using the permutation groups, however, we need to use two
cycles to represent a chromosome, with one cycle representing a strand of the chromosome
and the other representing the complementary strand. For this purpose, we first let
*E *= {-1, 1, -2, 2, ..., *-n, -n*} and Γ = (1, -1)(2, -2) ... (*n, -n*). We then use an *admissible *cycle, which is a cycle containing no *i *and its opposite *-i *simultaneously for some
*π*^{+}, and use *π*^{- }= Γ · (*π*^{+})^{-1}, which is the *reverse complement *of *π*^{+}, to represent the opposite strand of *π*^{+}. As demonstrated in our previous work [17], it is useful to represent a double stranded chromosome *π *by the product of its two strands *π*^{+ }and *π*^{-}, that is,
*π*, as described in the following lemmas.

**Lemma 1 (**[17]**) ***Let π *= *π*^{+}*π*^{- }*denote a double stranded DNA and let x and y be two elements in E. If *
*that is, x and y are in the different strands of π, then the effect of *(*π*Γ(*y*), *π*Γ(*x*))(*x*, *y*)*π is a reversal acting on π*.

**Lemma 2 (**[17]**) ***Let π *= *π*^{+}*π*^{- }*denote a double stranded DNA and let u, v, x and y be four elements in E. If *(*x, u, y, v*)*|π, that is, x, u, y and v appear in the same strand of π in this order, then the
effect of *(*π*Γ(*v*)*, π*Γ(*u*)) (*π*Γ(*y*)*, π*Γ(*x*)) (*u, v*)(*x, y*)*π is a block-interchange acting on π*.

Moreover, as described in the following lemma, we have shown in [17] that given two different DNA molecules *π *and *σ*, every cycle *α *in (the cycle decomposition of) *σπ*^{-1 }always has a *mate *cycle (*π*Γ) · *α*^{-1 }that also appears in *σπ*^{-1}. In fact, *α *and (*π*Γ) · *α*^{-1 }in *σπ*^{-1 }are each other's mate cycle.

**Lemma 3 (**[17]**) ***Let π and σ be two different double-stranded DNA molecules. If α is a cycle in σπ*^{-1}, *then *(*π*Γ) · *α*^{-1 }*is also a cycle in σπ*^{-1}.

#### An efficient algorithm for the one-sided block ordering problem

To clarify our algorithm, we start with defining some notations. Let *α *denote an arbitrary linear DNA molecule (or contig). As mentioned previously, it is
represented by the product of its two strands *α*^{+ }and *α ^{-}*, that is,

*α*=

*α*

^{+}

*α*. If

^{-}*α*contains

*k*genes (or markers), we also denote its

*α*

^{+ }by (

*α*

^{+}[1]

*, α*

^{+}[2], ...,

*α*

^{+}[

*k*]), where

*α*

^{+}[

*i*] is the

*i*-th gene in

*α*, and its

*α*

^{- }by (

*α*[1]

^{- }*, α*[2], ...,

^{-}*α*[

^{-}*k*]). By convention,

*α*

^{+}[1] and

*α*[1] are called as

^{-}*tails*of

*α*. Let

*π*=

*π*

_{1}

*π*

_{2 }...

*π*be a linear, uni-chromosomal genome that is partially assembled into

_{m }*m*contigs

*π*

_{1},

*π*

_{2}, ...,

*π*, each with

_{m}*n*genes, and σ = (1, 2

_{i }*, ..., n*) be a linear, uni-chromosomal genome that is assembled completely. Let

*C*= {

*c*=

_{k }*n*+

*k*+ 1: 0 ≤

*k*≤ 2

*m -*1} ∪ {

*-c*=

_{k }*-n - k -*1: 0 ≤

*k*≤ 2

*m -*1} be a set of 4

*m*distinct integers, called

*caps*, which are different from those genes in

*E*. Let

*-c*

_{2(i-1)+1 }to the ends of each contig

*π*, where 1 ≤

_{i}*i*≤

*m*, leading to a capping contig

*j*≤

*n*+ 1, and

_{i }*m*-1 dummy contigs without any genes (i.e., null contigs)

*σ*

_{2},

*σ*

_{3}, ...,

*σ*into

_{m }*σ*, where the original contig in

*σ*becomes

*σ*

_{1 }now, and add four caps

*c*

_{2(i-1)}

*, c*

_{2(i-1)+1}

*, -c*

_{2(i-1) }and

*-c*

_{2(i-1)+1 }to the ends of each contig

*σ*to obtain a capping contig

_{i }*j*≤

*n*+ 1, and

_{i }*π*and

*σ*by

Given an integer *x *in
*α *= *α*^{+}*α ^{- }*with

*k*genes (or markers), we define a function

**char**(

*x*,

*) below to represent the character of*
α
^

*x*in the capping contig

*C*to the ends of

*α*.

In addition, we define 5cap
*x*. For convenience, we extend the definitions above from the capping contig to the
capping genome. For instance, given a capping genome, say
*x *in a capping contig
*x*, and 5cap
*x*, that is,

**Lemma 4 (**[17]**) ***For a capping genome *
*and *
*if *char
*(respectively*, *T)*, *then *
*is T (respectively*, *C3) and if *
*respectively*, *N3 and C5)*, *then *
*is O (respectively*, *N3 and C5)*.

Basically, we design our algorithm to solve the one-sided block ordering problem by
dealing with the contigs of the capping genome
*x *and *u*, as well as *y *and *v*, lie in the same contig stand in
*x *and *y *appear in the different contigs in

**Lemma 5 (**[17]**) ***Let c*_{1 }= (*x, y*) *denote a 2-cycle with *char
*and *char
*, and let *
*and *
*If *
*and *
*then the effect of *
*is a fusion that acts on π by concatenating the contig containing y with the contig
containing x*.

It is not hard to see that the permutation induced by an ordering of the uncapped
genome *π *can be considered as the result of applying consecutive *m - *1 fusions to the *m *contigs in *π*. Based on the above discussion, it can be realized that our purpose is to find *m - *1 translocations to act on
*π *are *m - *1 fusions and the genome rearrangement distance measured by weighted reversals and
block-interchanges between the resulting assembly of the contigs in *π *and *σ *is minimum. In Algorithm 1 below, we describe our algorithm for efficiently solving
the one-sided block ordering problem, where reversals are weighted one and block-interchanges
are weighted two. Basically, we try to derive *m - *1 fusions from
*π *in Algorithm 1.

**Algorithm 1**

**Input: **A partially assembled, linear, uni-chromosomal genome *π *= *π*_{1}*π*_{2 }... *π _{m }*and a completely assembled, linear, uni-chromosomal genome

*σ*=

*σ*

_{1}.

**Output: **An optimally assembled genome of *π*, denoted by *assembly*(*π*), and the weighted reversal and block-interchange distance Δ(*π, σ*) between *assembly*(*π*) and *σ*.

**1: **Add *m - *1 null contigs
*σ *such that

Obtain
*π *and *σ*.

**2: **Compute

**3: /* To perform cap exchanges */**

Let *i *= 0.

**while **there are *x *and *y *in a cycle of
**do**

Let *i *= *i *+ 1.

Find *x *and *y *in a cycle of

Let

Calculate new

**end while**

**4: /* To find consecutive m ***- ***1 fusions */**

Let *i *= 0.

**while **there are two adjacent elements *x *and *y *in a cycle of
**do**

Let *i *= *i *+ 1.

Find two adjacent elements *x *and *y *in a cycle of

Let

Calculate new

**end while**

**while ***i < m - *1 **do**

Let *i *= *i *+ 1.

Find two adjacent elements *x *and *y *in a cycle of

Find the strand of a different contig in
*z*, different from *y*.

Let

Calculate new

**end while**

Let *assembly*(*π*) denote the assembled contig in current

**5: /* To find reversals */**

Let

**while **there are two adjacent elements *x *and *y *in a cycle of
**do**

Let

Find two adjacent elements *x *and *y *in a cycle of

Let

Calculate new

**end while**

**6: /* To find block-interchanges */**

Let

**while
**

Let

Choose any two adjacent elements *x *and *y *in a cycle of

Find two adjacent integers *u *and *v *in a cycle of

Let

Calculate new

**end while**

**7: **Output *assembly*(*π*) and

Below, we consider an example to clarify Algorithm 1. Let *π *= {[1, 4], [-5, 6], [3, 2]} and *σ *= {[1, 2, ..., 6]} be the input linear, uni-chromosomal genomes of Algorithm 1. In
our algorithm, these two genomes will be further represented by *π *= (1, 4)(-4, -1)(-5, 6)(-6, 5)(3, 2)(-2, -3) and *σ *= (1, 2, ..., 6)(-6*, -*5, ..., -1). First of all, we add two null contigs into *σ *and cap all the contigs in *π *and *σ *in a way such that
*π *whose induced permutation [1,4] ⊙ [-5, 6] ⊙ [3,2] = (1, 4, -5, 6, 3, 2) can be transformed into the permutation (1, 2, ..., 6) of *σ *using a reversal and a block-interchange (i.e., Δ(*π, σ*) = 3).

Actually, after running the step 3 of Algorithm 1, it can be verified according to
the capping of *π *and *σ *and Lemma 3 that for any two adjacent elements *x *and *y *in a cycle of
*π*, as explained as follows. Notice that
*x *and *y *are in the same cycle of
* _{i }*can be rewritten as

*a*

_{1 }acts on

*π*and

*α*

_{2 }continues to act on

*τ*on

_{i }*π*. The above discussion indicates that a fusion to

*π*can be mimicked by a translocation

*τ*, which acts on

*π*, followed by zero or more translocations acting on

In the following, we prove the correctness of Algorithm 1. Initially, it is not hard
to see that all the 5' caps are fixed in
*x *and *y *with the same character (either T or C3) in
*c*_{1 }= (*x, y*) and
*x *and *y*, respectively, in
*m-*1) cycles in the resulting
*x*, with
*y*, with
*m - *1) 2-cycles from these cycles in
*m-*1) 2-cycles with character pair (T, C3), denoted by
*π *such that the weighted reversal and block-interchange distance between the permutation
induced by this ordering of *π *and *σ *is minimum. In fact, *f _{k }*and

*k*≤

*m -*1, are derived from two mate cycles in

*mate*2-cycles below. Moreover, if

For 1 ≤ *k *≤ *m - *1, we simply let
*π *can be mimicked by performing *m - *1 consecutive fusions on *π *that has *m *contigs initially. According to Lemma 5 and our previous discussion, if
*k *≤ *m - *1, then
*π*, where
*g _{k }*and

*λ*2-cycles

*m*- 1, that is,

*k*≤

*λ*, but

*k*≤

*m*- 1. In this situation, we shall show below that we still can use

*π*, as we did in the step 4 in Algorithm 1.

Recall that the 5' caps are all fixed in the beginning
*x *and *y *with char
*π*. Let us now pay attention on those cycles in
*x *and *y *from a cycle in

**Lemma 6 **Let
*be a fusion to act on π*, *where *char
*and *char
*Then *

*Proof*. For simplicity, it is assumed that we cannot find any cap exchange from

Case 1: Suppose that
*x *and *y *lie in the same cycle, say *α*, in
*α *can be expressed as *α *= *α*_{1}*α*_{2}(*x*, *y*), where *α*_{1 }= (*a*_{1}, ..., *a _{i}*}) and

*α*

_{2 }= (a

_{i+1}, ...,

*a*). Let

_{j}*α*in

*τ*to

*α*becomes two disjoint cycles

*α*

_{1 }and

*α*

_{2 }in

Case 2: Suppose that
*x *and *y *lie in two different cycles, say *α*_{1 }and *α*_{2}, in
*α*_{1 }and *α*_{2}, respectively, in
*τ *on
*α*_{1 }and *α*_{2 }to be joined together into a cycle, say *α*, in
*α*_{1 }and *α*_{2}, as well as
*α*_{1 }and *α*_{2}, as well as exactly one of
*α*_{1 }and *α*_{2 }will also change char
*α*, as well as
*α *and
*α*_{1 }and *α*_{2}, as well as both
*τ *to
*α*, as well as
*α *and
*α*, as well as

Notice that if
*τ *that acts on
*π *decreases the norm
*τ *as a *good *fusion of *π *if

**Corollary 1 ***Let *
*be a fusion to act on π, where *char
*and *char
*If *
*then *τ *is a good fusion to perform on π*.

According to Corollary 1, it can be realized that *f _{k}*, as well as its mate 2-cycle

*π*, where 1 ≤

*k*≤

*λ*. If

*λ*=

*m*- 1, then performing the

*m*- 1 fusions on

*π*, as we did in Algorithm 1, corresponds to an optimal ordering of

*π*such that the weighted reversal and block-interchange distance between the assembly of

*π*and

*σ*is minimum. For simplifying our discussion below, we assume that the λ good fusions derived from

*f*

_{1},

*f*

_{2}, ...,

*f*

_{λ }and their mate 2-cycles can assemble λ + 1 contigs of

*π*into several super-contigs. If λ <

*m*- 1, then we show below that the fusions of

*m*- 1 contigs in

*π*performed by our algorithm utilizing

*f*

_{1},

*f*

_{2}, ...,

*f*

_{m-1 }is still optimal.

**Lemma 7 ***Let *
*be any sequence of m *- 1 *translocations that act on *
*as fusions to assemble m *- 1 *contigs in π*. *Let *
*be the genome obtained by performing τ _{k }and zero or more following cap exchanges on *

*such that no more cap exchange can be derived from*

*where*

*and*1 ≤

*k*≤

*m*- 1.

*Then*

*Proof*. For simplicity, we assume that in the beginning, no cap exchange can be derived
from
*k *≤ *m *- 1. By Lemma 6,
*λ *translocations from
*π*, say
*m *- λ - 1) other 2-cycles
*π *since their T and C3 elements lie in the same contig strand in
*f *and its mate 2-cycle *f*', from
*τ*, to act on *π*, then the C3 elements in both *f *and *f*' must locate at a contig whose T elements are in some *f _{k }*and

*k*≤ λ. This implies that the good fusion

*τ*cannot act on

*τ*is a good fusion to

_{k }