Abstract
Many cancer genome sequencing efforts are underway with the goal of identifying the somatic mutations that drive cancer progression. A major difficulty in these studies is that tumors are typically heterogeneous, with individual cells in a tumor having different complements of somatic mutations. However, nearly all DNA sequencing technologies sequence DNA from multiple cells, thus resulting in measurement of mutations from a mixture of genomes. Genome rearrangements are a major class of somatic mutations in many tumors, and the novel adjacencies (i.e. breakpoints) resulting from these rearrangements are readily detected from DNA sequencing reads. However, the assignment of each rearrangement, or adjacency, to an individual cancer genome in the mixture is not known. Moreover, the quantity of DNA sequence reads may be insufficient to measure all rearrangements in all genomes in the tumor. Motivated by this application, we formulate the kminimum completion problem (kMCP). In this problem, we aim to reconstruct k genomes derived from a single reference genome, given partial information about the adjacencies present in the mixture of these genomes. We show that the 1MCP is solvable in linear time in the cases where: (i) the measured, incomplete genome has a single circular or linear chromosome; (ii) there are no restrictions on the chromosomal content of the measured, incomplete genome. We also show that the kMCP problem, for k ≥ 3 in general, and the 2MCP problem with the doublecutandjoin (DCJ) distance are NPcomplete, when there are no restriction on the chromosomal structure of the measured, incomplete genome. These results lay the foundation for future algorithmic studies of the kMCP and the application of these algorithms to real cancer sequencing data.
Introduction
Nearly all current genome sequencing studies sequence the DNA from a population of cells rather than from single cells. This is because present DNA sequencing technologies cannot sequence the DNA in a single cell without biasinducing DNA amplification steps. In the majority of applications, sequencing such a population of cells is not problematic because the DNA in every cell is nearly identical. However, there are two notable examples: metagenomics (e.g. environmental sequencing or microbiome studies) and cancer sequencing. In the former case, the genomic differences between cells are due to the presence of mixtures of microorganisms in the sample. In the latter case, the genomic differences between cells are due to somatic mutations that accumulate in individual tumor cells during the progression of cancer [1].
In this paper, we formulate the problem of inferring the organization of each genome present in a mixture in the case where: (1) the individual genomes result from an unknown sequence of genome rearrangements from a known (reference) genome; (2) the adjacencies (breakpoints) of the genomes in the mixture are measured. This situation arises in cancer genome studies where somatic structural aberrations (including inversions, translocations, duplications, deletions, or other rearrangements of large pieces of DNA) induce novel adjacencies, also called breakpoints, that join in the cancer genome two noncontiguous nucleotides from the normal genome. In current cancer sequencing projects, these novel adjacencies are determined from alignments of pairedend reads from cancer DNA to the reference human genome [2,3]. However, these approaches generally do not measure all adjacencies present in the tumor. For example, the quantity of DNA sequence reads (coverage) may be insufficient to measure all adjacencies in all genomes in the tumor, particularly adjacencies that are present in a minority of cancer cells. Moreover, alignment of reads to repetitive regions is challenging, particularly for short reads produced by current sequencing technologies, and thus some adjacencies may not be reliably measured.
We formulate the kMinimum Completion Problem (kMCP) of determining the k genomes present in a mixture from a set of measured adjacencies that minimize the total distance between the reference genome and the k measured (i.e. cancer) genomes. The kMCP is a general problem that encompasses different subproblems that depend on the genomic distance used and the desired chromosomal content of the measured genomes. We show the following results: (1) A linear time algorithm for the 1MCP in the double cut and join (DCJ) distance [4] when the desired genome has no restrictions on its chromosomal content; (2) A linear time algorithm for the 1MCP in the DCJ distance when the desired genome has a single circular or linear chromosome; (3) the kMCP is NPcomplete for any distance when k ≥ 3; and (4) the 2MCP with DCJ distance is NPcomplete when the desired genome has no restrictions on its chromosomal content, or when the desired genome has all circular chromosomes.
We emphasize that the kMCP does not model all the issues arising in cancer sequencing: in particular, we restrict attention to copyneutral structural variants, and ignore single nucleotide mutations, small indels, or other large copy number aberrations. Single nucleotide mutations and small indels can be addressed separately as they do not produce novel adjacencies of the type studied in kMCP. Copy number aberrations are common in cancer, but appropriate handling of these mutations when measured in a heterogeneous mixture introduces an entirely different set of challenges: e.g. a deletion of a genomic segment in half of the cells in the mixture with a duplication of the same segment in the other half of the cells will be difficult to distinguish from no copy number change. Finally, we assume that all measured adjacencies are real, while in fact there are likely to be false positive adjacencies. Extending the model to consider these additional complexities is left for future work.
In following sections, we first provide a precise formulation of the kMCP and describe related work. Then, we provide algorithms and proofs of the complexity of the problem in various cases.
Definitions and problem statement
In this section we present some preliminary definitions and give the formal definition of kMCP.
A gene g is an oriented sequence of nucleotides, with two extremities: a head g_{h }and a tail g_{t}. An adjacency is an unordered pair of gene extremities. A genome
Figure 1. Genome and genome graph. (a) A genome
The genome graph of a genome
A chromosome of
As described above a pairedend sequencing experiment provides the adjacencies
A multigenome is a mixture of genomes with the same set of genes. Formally, the multigenome
The genome graph is related to the breakpoint graph in genome rearrangement studies. The breakpoint graph
Our knowledge about a multigenome can be incomplete. For example a tumor is a mixture
of different cancer genomes, and during sequencing process, we obtain a mixture of adjacencies from these genomes. We represent the mixtures of adjacencies by a partial multigenome. A partial multigenome is a multiset
If k is a positive integer and
We use a distance function to distinguish between different completions. A distance function on pairs of genomes (with the same set of genes), is a measure of dissimilarity
between the genomes. Having selected a pairwise distance function we must define a
distance between the k genomes in a mixture. Motivated by the fact that the different cancer genomes in a
tumor are obtained by somatic genome rearrangements from a healthy genome, we model
the evolution of the cancer genomes by a rooted tree in which all the cancer genomes
are descendants of the healthy one. Suppose
where E is the set of edges in
We now define the kMinimum Completion Problem.
kMinimum Completion Problem (kMCP) Given a
As written, the kMCP is a general problem that encompasses many subproblems depending on chromosomal
condition set
For two genomes
where
Remark. When at least one of the
Related work
In comparison to other genome rearrangement problems considered in the literature, the kMCP has three distinguishing features. (1) The input is a mixture of adjacencies from multiple genomes and the genome of origin of each adjacency is unknown. (2) The set of adjacencies is incomplete: not every adjacency from every genome in the mixture is measured. (3) The ancestral relationships between the genomes in the mixture are unknown, and might include both "ancestral" and "present day" genomes. Some of these features have been considered individually in other work, but to our knowledge no previous work has considered all three together. The first feature bears some resemblance to the genome halving problem [9] of finding the doubled ancestor genome by minimizing a rearrangement distance. This problem and further generalizations to polyploidization [10] involves partitioning (or coloring) adjacencies to minimize a rearrangement distance. However, in general no adjacencies are missing and the distance is pairwise (i.e., no tree) in contrast to the 2MCP.
Regarding the second feature, several authors have considered the problem of inferring
missing adjacencies in a manner that optimizes a genome rearrangement distance. Notably,
[11] and [12] consider the problem of computing reversal distance between pairs of partially assembled
genomes that are provided as unordered sequences of contigs. These problems were motivated
by limitations in DNA sequence technologies that result in most wholegenome assemblies
being highly fragmented and comprised of contigs whose relative ordering is unknown.
These problems are variations of the 1MCP, where the reference genome
Regarding the third feature, the genome median problem considers the problem of finding an ancestral genome that minimizes the distance between three given genomes [5,13]. This is different from kMCP in that the three individual genomes are known (rather than mixed) and the genomes are complete with no missing adjacencies. Also, in the median problem the topology of the phylogenetic tree has been already inferred, while in kMCP we have to find an optimal topology for the phylogenetic tree as well.
Results
In this section we first consider the 1MCP problem. We present linear time algorithms that solve 1MCP in the cases where: (i) the measured, incomplete genome has a single circular or linear chromosome; (ii) there are no restrictions on the chromosomal content of the measured, incomplete genome.
Next we prove that the unrestricted kMCP is NPcomplete when k ≥ 3 for any distance function ϕ. Finally, we show that the unrestricted 2MCP, and the restricted 2MCP where all
chromosomes are circular (i.e.,
1MCP
Here, we consider the unrestricted 1MCP and two restricted versions of 1MCP problem: (1) the chromosomal condition set is {circular, unichromosomal}, which we denote by 1MCP_{c}; (2) the chromosomal condition set is {linear, unichromosomal}, which we denote by 1MCP_{ℓ}. We first show that unrestricted version is linearly tractable. Then, we show that we can solve the 1MCP_{c }in linear time. Finally, we prove a relation between 1MCP_{c }and, 1MCP_{ℓ }which implies that 1MCP_{ℓ }is also solvable in linear time.
Note that 1MCP_{ℓ }is a variation of the Block Ordering Problem (BOP) considered in [12]. In our terminology, the BOP considers two partial genomes, and aims to complete both partial genomes into linear, unichromosomal genomes such that the pairwise distance between the completed genome is minimal. In [12], Gaul and Blanchette provide a linear algorithm for BOP. The algorithm we present for 1MCP_{ℓ }is simpler than the algorithm for the BOP in [12], and our algorithm is obtained from a straightforward algorithm (Algorithm 1 below) which solves 1MCP_{c }in linear time.
We begin with the unrestricted 1MCP, where we have the following result.
Theorem 1. The unrestricted 1MCP with DCJ distance is linearly tractable.
Proof. In 1MCP we have a single partial genome
Figure 2. Possible mixture trees when k = 1, 2. (a) The only topology in 1MCP. (b) Branchtree and (c) pathtree topologies in 2MCP.
Figure 3. The breakpoint graph
1MCP_{c}: circular unichromosomal completion
Here we consider 1MCP_{c}, the restricted 1MCP for a partial genome
The first constraint on partitioning of
The second constraint on partitioning of
We define the freeextremities graph,
Figure 4. Adding adjacencies to a partial genome
To find a completion of the partial genome
(i) u and v are no longer free vertices.
(ii) If
(iii) If
Therefore, adding the adjacency {u, v} to
If {u, v} is a blue edge in R, then update(R, {u, v}) increases the number of cycles in the breakpoint graph B by one. Hence, to find a solution to 1MCP_{c }we want to perform update(R, {u, v}) transformations with as many blue edges as possible. On the other hand, adding
new adjacencies has to merge the paths in the graph
Theorem 2. Suppose
where N_{b}(R) is the number of blue edges, and c(R) is the number of cycles in R.
Proof. We prove the theorem by induction on N_{b}(R). Suppose N_{b}(R) = 1. Then necessarily R consists of a cycle of length 2 with one blue and one red edge, and c(R) = 1. Thus, we update the graph R with the unique (and the only possible) blue edge obtaining
Now suppose N_{b}(R) >1. Then
Let R' = update(R, {u, v}) be the freeextremities graph after the update. Since u and v are incident with blue edges in R, after update(R, {u, v}) the number of blue edges decreases by one, i.e., N_{b}(R') = N_{b}(R)  1.
Thus, by induction hypothesis
Considering the above cases we have:
(i) After update(R, {u, v}), C_{u }and C_{v }will shrink into one cycle, and c(R') = c(R)  1. Thus by (2), M_{b}(R') = N_{b}(R)  c(R) + 1. By choosing such an edge we can update R with N_{b}(R)  c(R) + 1 blue edges.
(ii) After update(R, {u, v}), C shrinks into a smaller cycle, and c(R') = c(R). Thus, by (2), M_{b}(R') = N_{b}(R)  c(R). Since {u, v} is a blue edge, we can update R with N_{b}(R)  c(R) + 1 blue edges.
(iii) After update(R, {u, v}), C splits into two smaller cycles. Thus c(R') = c(R) + 1. Thus, by (2), M_{b}(R') = N_{b}(R)  c(R)  1. So by choosing {u, v} we can update R with N_{b}(R)  c(R)  1 blue edges.
By calculations above, choosing a pair {u, v} satisfying cases (i) or (ii) will result in a greater number of update moves with blue edges, than choosing a pair satisfies the case (iii). Moreover, considering pairs {u, v} from cases (i) and (ii) gives M_{b}(R) = N_{b}(R)  c(R) + 1. □
We call a pair {u, v} (which may or may not be an edge in R) satisfying case (i) or (ii) in the proof of Theorem 2 an optimal adjacency. Optimal adjacencies play an important role in finding a solution of 1MCP_{c}: updating the freeextremities graph with these adjacencies results in the maximum number of blue edges used in update transformations. We have the following important corollary to this theorem.
Corollary 1. Suppose
Proof. By Theorem 2, adding any optimal adjacency to
A linear time (in number of genes) algorithm for solving 1MCP_{c }adds optimal adjacencies according to cases (i) and (ii) in Theorem 2, and is shown in Algorithm 1. The following corollary is an immediate consequence of Corollary 1 and Algorithm 1.
Corollary 2. The 1MCP_{c }is solvable in linear time.
Algorithm 1: Solving 1MCP_{c}
Input : Partial genome
Output: A 1completion
1 begin
2 Construct the freeextremities graph
3
4 while c(R) >1 do
5 u, v ← select two vertices from different cycles in R;
6
7 R ← update (R, {u, v});
8 while the number of blue edges in R >1 do
9 u, v ← select two vertices connected via a blue edge in R;
10
11 R ← update (R, {u, v});
12 Add the single remaining excluded edge in
13 Output the resulting circular unichromosomal genome
14 end
1MCP_{ℓ}: linear unichromosomal completion
In this section we consider the 1MCP with chromosomal condition of a linear unichromosomal genome. We refer to this restricted problem as 1MCP_{ℓ}. We relate solutions of 1MCP_{ℓ }to solutions of 1MCP_{c}. Combined with the results in the previous section, we derive a linear time algorithm for 1MCP_{ℓ}.
Recall that
Theorem 3. Let
Proof. First, suppose e is not in any cycle in the graph
Figure 5. Relating 1MCP_{c }and 1MCP_{ℓ}. (a) The breakpoint graph
where the last inequality follows from the definition of
Now suppose
Thus by (3) and (4) we have
Now suppose e is in a cycle in
Notice that the function θ depends only on the partial genome
Corollary 3. The 1MCP_{ℓ }is solvable in linear time.
Proof. Suppose
(3 ≤ k)MCP
In the unrestricted case of the kMCP, the completion of a partial genome is always possible as we can add adjacencies and telomeres arbitrarily to the partial genome, since there is no restriction on the number and type of chromosomes in the resulting genome. The hardness of showing the existence of a kcompletion derives from the fact that finding a kcompletion for the partial multigenome results in a proper edge coloring for the genome graph of the partial multigenome.
Let G = (V, E) be a graph. We define the edgechromatic number of G, denoted χ'(G), to be the minimum number of colors required to obtain an edgecoloring of G. For each edgecoloring of G a color class is a set of all edges with a specific color. A color class defines a matching in the graph since no two edges of the same color share a vertex.
The following proposition shows the relation between the edgecoloring of a genome graph and the edge color classes of the corresponding breakpoint graph.
Proposition 1. If
Proof. Suppose
Using the same argument as in Proposition 1 we have:
Lemma 1. If
Now, in the following theorem we show a relation between the edgecoloring of a genome graph and the kcompletion of the corresponding partial multigenomes.
Theorem 4. Let
Proof. (⇒) If
(⇐) Now assume that
Now, by Theorem 4 and using the following two classic theorems, we show that deciding whether there exists a valid solution to a (k ≥ 3)MCP is NPcomplete. For a graph G let Δ(G) be the maximum degree of G.
Theorem 5 (Vizing [14]). If G is a simple graph, χ'(G) = Δ(G) or Δ(G) + 1.
Theorem 6 (Holyler [15]). For a graph G, deciding whether χ'(G) = Δ(G) or Δ(G) + 1 is NPcomplete, if Δ(G) ≥ 3.
Corollary 4. If k ≥ 3, deciding whether there exists a valid solution to the unrestricted kMCP is NPcomplete.
Proof. In order to prove this corollary we reduce the edgecoloring problem to kMCP. Suppose G = (V, E) is a simple graph and k = Δ(G) ≥ 3. If V  is not even, add an isolated vertex so that the number of vertices in G is 2n for some positive integer n. Consider these 2n vertices as gene extremities of a set of n genes. Now, G defines a partial multigenome
Note that in Corollary 4 we only considered the unrestricted version of kMCP. This allows us to assume that for each (multi) graph G there exists a partial multigenome
Corollary 5. If k ≥ 3, then the unrestricted kMCP is NPcomplete.
Proof. Since in solving a kMCP we need to find a kcompletion for its partial multigenome, by Corollary 4 the proof is complete. □
2MCP
In this section, we prove that the unrestricted 2MCP, and the restricted 2MCP where
all chromosomes are circular (i.e.,
Theorem 7. The unrestricted 2MCP with DCJ distance is NPcomplete.
In order to provide the proof of this theorem, we need the following lemmas.
Lemma 2. Suppose
Proof. Note that in 2MCP there are only two possible topologies for the mixture tree:
the branchtree and pathtree (Figure 2b, c). Since the degree of each vertex in
which shows that the d_{DCJ}value of a path tree is smaller than the d_{DCJ }value of a branch tree, and completes the proof. □
Lemma 3. Any MAX 3SAT instance is reducible to a MAX 3AND instance. Moreover, MAX 3AND is NPcomplete.
Proof. Let Δ = ℓ_{1 }V ℓ_{2 }V ℓ_{3 }be a clause (disjunction) of three literals. Define
By using basic Boolean rules we have Δ ⇔ V_{S∈ℓ(Δ) }S.
Now, suppose
Now, consider an instance
Figure 6. Representing variables with cycles. (a) A variable represented by a cycle, (b) a true matching, and (b) a false matching.
Figure 7. Representing conjunctions with cycles. (a) Three cycles representing the literals
Let ℓ(x_{1}), ℓ(x_{2}), ℓ(x_{3}) be three literals of variables x_{1}, x_{2}, x_{3}, and Δ = (ℓ(x_{1}) Λ ℓ(x_{2}) Λ ℓ(x_{3})) be a conjunction in
1. For each i ∈ {1, 2, 3} consider an edge in T(x_{i}) if ℓ(x_{i}) = x_{i}. If
2. Add three new edges, called conjunctionedges, to the three edges we chose in the previous step, and build a cycle of length 6. This cycle is a conjunctioncycle of Δ.
It is easy to see that an assignment α to x_{i}'s satisfy the conjunction Δ if and only if the corresponding matching assignment
to α keeps all the edges in the conjunctioncycle of Δ. We call a representation of a MAX
3AND instance
If the literals of a variable appear in at most t conjunctions, and the variablecycles have length at least 4t, then by choosing the edges of conjunctioncycles properly, we have a graphical representation of a MAX 3AND instance, where no edge in a variablecycle is incident with two conjunction edges from different conjunctioncycles. This implies the following lemma:
Lemma 4. For each MAX 3AND instance
Combining Lemmas 24 gives the proof of Theorem 7.
Proof of Theorem 7. Since the MAX 3AND is NPcomplete by Lemma 3, it suffices to reduce the MAX 3AND
problem to the 2MCP. Suppose
We end this section by considering the restricted version of kMCP, where the chromosomal condition set is {circular}, i.e. all genomes have all circular chromosomes. We denote this restricted version by kMCP_{c}, and the unrestricted version of kMCP by kMCP_{∅}. If opt(kMCP_{c}) and opt(kMCP_{∅}) are the d_{DCJ}value of a solution to kMCP_{c }and kMCP_{∅}, respectively, then:
Theorem 8. For the kMCP_{c }and kMCP_{∅ }versions of kMCP with DCJ distance we have
Proof. First note that each solution to kMCP_{c }is also a solution of kMCP_{∅}, since there is no restriction in kMCP. Hence, opt(kMCP_{c}) ≥ opt(kMCP_{∅}). Second, for each solution to kMCP_{∅ }if the resulting genomes are not circular we can add new edges to the genomes and make them circular. By adding the new edges the number of cycles in the breakpoint graph does not decrease which implies that the d_{DCJ}value of the newly obtained genomes is not larger than opt(kMCP_{∅}). Therefore, these circular genomes form a solution of kMCP_{∅}. So opt(kMCP_{c}) ≤ opt(kMCP_{∅}) completing the proof. □
Combining this theorem and Theorem 7 we have
Corollary 6. If k ≥ 2, then kMCP_{c }with DCJ distance is NPcomplete.
Discussion and conclusion
In this paper we introduced the kMinimum Completion Problem (kMCP) motivated by the type of data produced in current cancer genome sequencing studies.
We showed the following results. (1) A linear time algorithm for the unrestricted
1MCP; (2) a linear time algorithm for the restricted versions 1MCP where the genomes
are circular or linear; i.e. the chromosomal condition set
There are numerous further directions to pursue. As noted in the introduction, the model described in this paper does not consider all the complexities of cancer genome sequencing: most importantly copy number aberrations (duplications and deletions) and errors in the measured adjacencies are important features of cancer genome sequencing and should be addressed.
To handle errors, one might consider weighted versions of the kMCP where adjacencies have a weight corresponding to the confidence in the measurement. Regarding the current model, further work is needed on different chromosomal conditions, genomic distances, or other constraints on the relationships between the genomes in the mixture. For example, the case of linear chromosomes demands further attention, as human chromosomes are linear, although circular chromosomes have been observed in cancer [17]. Similarly, one may impose an upper bound on the number of chromosomes. One may also place restrictions on the structure of the mixture tree.
Another direction is to derive approximation algorithms. In the kMCP we aim to minimize distance over all possible kcompletion and mixture trees simultaneously. However, by separating the completion and distance optimization steps, one may employ techniques that have developed for other problems. For example, one may try to first complete the partial multigenomes using some clustering techniques, as have been employed in metagenomic studies [18]. With complete genomes, one could then try to find optimal mixture trees rooted at the reference genome. Depending on the allowed structure of the mixture tree, techniques from genome rearrangement phylogeny problems may be employed. For example, in the case of 2MCP, if the complete genomes are the leaves of the mixture tree, then the problem becomes the median problem (with the reference genome genome as the third genome) [5,13]. Alternatively, if the genomes are the vertices of the mixture tree, then the tree construction problem becomes the problem of finding a minimum spanning tree, which is in generally easier. In between these extremes, where some of the genomes in the mixture are the leaves and some are intermediate nodes (ancestors), the problem becomes a Steiner tree problem. In the cancer application, any of these cases might provide useful approximations, as the process of clonal evolution of cancer [1] might mean that cells at intermediate stages of cancer progression remain present in the tumor.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
All authors contributed equally to this work.
Acknowledgements
We thank the anonymous referees for helpful comments on an earlier version of this manuscript. This work was supported by a CAREER Award from the National Science Foundation (#1053753). In addition, BJR is supported by a Career Award from the Scientific Interface from the Burroughs Wellcome Fund and an Alfred P. Sloan Research Fellowship.
This article has been published as part of BMC Bioinformatics Volume 13 Supplement 19, 2012: Proceedings of the Tenth Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/13/S19
References

Nowell PC: The clonal evolution of tumor cell populations.
Science 1976, 194(4260):2328. PubMed Abstract  Publisher Full Text

Raphael BJ, Volik S, Collins C, Pevzner PA: Reconstructing tumor genome architectures.
Bioinformatics 2003, 19(Suppl 2):i162171. PubMed Abstract  Publisher Full Text

Meyerson M, Gabriel S, Getz G: Advances in understanding cancer genomes through secondgeneration sequencing.
Nat Rev Genet 2010, 11(10):685696. PubMed Abstract  Publisher Full Text

Yancopoulos S, Attie O, Friedberg R: Efficient sorting of genomic permutations by translocation, inversion and block interchange.
Bioinformatics 2005, 21(16):33403346. PubMed Abstract  Publisher Full Text

Tannier E, Zheng C, Sankoff D: Multichromosomal median and halving problems under different genomic distances.
BMC Bioinformatics 2009., 10 PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Hannenhalli S, Pevzner PA: Transforming Men into Mice (Polynomial Algorithm for Genomic Distance Problem). In FOCS. IEEE Computer Society; 1995:581592.

Hannenhalli S, Pevzner PA: Transforming Cabbage into Turnip: Polynomial Algorithm for Sorting Signed Permutations by Reversals.
J ACM 1999, 46:127. Publisher Full Text

Bergeron A, Mixtacki J, Stoye J: A new linear time algorithm to compute the genomic distance via the double cut and join distance.
Theor Comput Sci 2009, 410(51):53005316. Publisher Full Text

ElMabrouk N, Sankoff D: The Reconstruction of Doubled Genomes.
SIAM J Comput 2003, 32(3):754792. Publisher Full Text

Warren R, Sankoff D: Genome Aliquoting Revisited. In RECOMBCG, Volume 6398 of Lecture Notes in Computer Science. Edited by Tannier E. Springer; 2010:112. PubMed Abstract  Publisher Full Text

Zheng C, Lenert A, Sankoff D: Reversal distance for partially ordered genomes.
ISMB (Supplement of Bioinformatics) 2005, 502508. PubMed Abstract  Publisher Full Text

Gaul É, Blanchette M: Ordering Partially Assembled Genomes Using Gene Arrangements. In Comparative Genomics, Volume 4205 of Lecture Notes in Computer Science. Edited by Bourque G, ElMabrouk N. Springer; 2006:113128.

Xu AW: A Fast and Exact Algorithm for the Median of Three ProblemA Graph Decomposition Approach. In RECOMBCG, Volume 5267 of Lecture Notes in Computer Science. Edited by Nelson CE, Vialette S. Springer; 2008:184197.

Vizing VG: On an estimate of the chromatic class of a pgraph. (Russian).

Holyer I: The NPCompleteness of EdgeColoring.
SIAM J Comput 1981, 10(4):718720. Publisher Full Text

Cook SA: The Complexity of TheoremProving Procedures. In STOC. Edited by Harrison MA, Banerji RB, Ullman JD. ACM; 1971:151158.

Raphael BJ, Pevzner PA: Reconstructing tumor amplisomes.
ISMB/ECCB (Supplement of Bioinformatics) 2004, 265273. PubMed Abstract  Publisher Full Text

Wu YW, Ye Y: A Novel AbundanceBased Algorithm for Binning Metagenomic Sequences Using ℓTuples. In RECOMB, Volume 6044 of Lecture Notes in Computer Science. Edited by Berger B. Springer; 2010:535549.