Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Proceedings of the Tenth Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics

Open Access Proceedings

Reconstructing genome mixtures from partial adjacencies

Ahmad Mahmoody*, Crystal L Kahn and Benjamin J Raphael*

Author Affiliations

Department of Computer Science, Brown University, Providence (RI), USA

For all author emails, please log on.

BMC Bioinformatics 2012, 13(Suppl 19):S9  doi:10.1186/1471-2105-13-S19-S9

The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2105/13/S19/S9


Published:19 December 2012

© 2012 Mahmoody et al; licensee BioMed Central Ltd.

This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Many cancer genome sequencing efforts are underway with the goal of identifying the somatic mutations that drive cancer progression. A major difficulty in these studies is that tumors are typically heterogeneous, with individual cells in a tumor having different complements of somatic mutations. However, nearly all DNA sequencing technologies sequence DNA from multiple cells, thus resulting in measurement of mutations from a mixture of genomes. Genome rearrangements are a major class of somatic mutations in many tumors, and the novel adjacencies (i.e. breakpoints) resulting from these rearrangements are readily detected from DNA sequencing reads. However, the assignment of each rearrangement, or adjacency, to an individual cancer genome in the mixture is not known. Moreover, the quantity of DNA sequence reads may be insufficient to measure all rearrangements in all genomes in the tumor. Motivated by this application, we formulate the k-minimum completion problem (k-MCP). In this problem, we aim to reconstruct k genomes derived from a single reference genome, given partial information about the adjacencies present in the mixture of these genomes. We show that the 1-MCP is solvable in linear time in the cases where: (i) the measured, incomplete genome has a single circular or linear chromosome; (ii) there are no restrictions on the chromosomal content of the measured, incomplete genome. We also show that the k-MCP problem, for k ≥ 3 in general, and the 2-MCP problem with the double-cut-and-join (DCJ) distance are NP-complete, when there are no restriction on the chromosomal structure of the measured, incomplete genome. These results lay the foundation for future algorithmic studies of the k-MCP and the application of these algorithms to real cancer sequencing data.

Introduction

Nearly all current genome sequencing studies sequence the DNA from a population of cells rather than from single cells. This is because present DNA sequencing technologies cannot sequence the DNA in a single cell without bias-inducing DNA amplification steps. In the majority of applications, sequencing such a population of cells is not problematic because the DNA in every cell is nearly identical. However, there are two notable examples: metagenomics (e.g. environmental sequencing or microbiome studies) and cancer sequencing. In the former case, the genomic differences between cells are due to the presence of mixtures of microorganisms in the sample. In the latter case, the genomic differences between cells are due to somatic mutations that accumulate in individual tumor cells during the progression of cancer [1].

In this paper, we formulate the problem of inferring the organization of each genome present in a mixture in the case where: (1) the individual genomes result from an unknown sequence of genome rearrangements from a known (reference) genome; (2) the adjacencies (breakpoints) of the genomes in the mixture are measured. This situation arises in cancer genome studies where somatic structural aberrations (including inversions, translocations, duplications, deletions, or other rearrangements of large pieces of DNA) induce novel adjacencies, also called breakpoints, that join in the cancer genome two noncontiguous nucleotides from the normal genome. In current cancer sequencing projects, these novel adjacencies are determined from alignments of paired-end reads from cancer DNA to the reference human genome [2,3]. However, these approaches generally do not measure all adjacencies present in the tumor. For example, the quantity of DNA sequence reads (coverage) may be insufficient to measure all adjacencies in all genomes in the tumor, particularly adjacencies that are present in a minority of cancer cells. Moreover, alignment of reads to repetitive regions is challenging, particularly for short reads produced by current sequencing technologies, and thus some adjacencies may not be reliably measured.

We formulate the k-Minimum Completion Problem (k-MCP) of determining the k genomes present in a mixture from a set of measured adjacencies that minimize the total distance between the reference genome and the k measured (i.e. cancer) genomes. The k-MCP is a general problem that encompasses different subproblems that depend on the genomic distance used and the desired chromosomal content of the measured genomes. We show the following results: (1) A linear time algorithm for the 1-MCP in the double cut and join (DCJ) distance [4] when the desired genome has no restrictions on its chromosomal content; (2) A linear time algorithm for the 1-MCP in the DCJ distance when the desired genome has a single circular or linear chromosome; (3) the k-MCP is NP-complete for any distance when k ≥ 3; and (4) the 2-MCP with DCJ distance is NP-complete when the desired genome has no restrictions on its chromosomal content, or when the desired genome has all circular chromosomes.

We emphasize that the k-MCP does not model all the issues arising in cancer sequencing: in particular, we restrict attention to copy-neutral structural variants, and ignore single nucleotide mutations, small indels, or other large copy number aberrations. Single nucleotide mutations and small indels can be addressed separately as they do not produce novel adjacencies of the type studied in k-MCP. Copy number aberrations are common in cancer, but appropriate handling of these mutations when measured in a heterogeneous mixture introduces an entirely different set of challenges: e.g. a deletion of a genomic segment in half of the cells in the mixture with a duplication of the same segment in the other half of the cells will be difficult to distinguish from no copy number change. Finally, we assume that all measured adjacencies are real, while in fact there are likely to be false positive adjacencies. Extending the model to consider these additional complexities is left for future work.

In following sections, we first provide a precise formulation of the k-MCP and describe related work. Then, we provide algorithms and proofs of the complexity of the problem in various cases.

Definitions and problem statement

In this section we present some preliminary definitions and give the formal definition of k-MCP.

A gene g is an oriented sequence of nucleotides, with two extremities: a head gh and a tail gt. An adjacency is an unordered pair of gene extremities. A genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a> on n genes is a set <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2">View MathML</a> of adjacencies such that each of the 2n gene extremities in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a> is a member of at most one adjacency in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2">View MathML</a>. The gene extremities which are not members of any adjacency in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2">View MathML</a> are called telomeres of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a>, and we denote the set of all telomeres by <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M3">View MathML</a> (Figure 1-a). Through this work, we assume that the genes of a genome are distinct.

thumbnailFigure 1. Genome and genome graph. (a) A genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a> on the set of genes {1, 2, 3, 4, 5} with two chromosomes (one linear and one circular). <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M186','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M186">View MathML</a>. (b) The genome graph (black edges) of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a> with additional edges (dotted) connecting the extremities of the same gene. There is one cycle component and one path component.

The genome graph of a genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a> is a graph whose labeled vertices are the gene extremities in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a>, and whose edge set is <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2">View MathML</a>. We denote the genome graph of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a> by <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M4">View MathML</a>. Because each extremity is in at most one adjacency of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2">View MathML</a>, the graph <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M4">View MathML</a> is a matching graph (not necessarily perfect). Note that the genome graph is uniquely determined by the genome, and conversely. For convenience, we also define the augmented genome graph <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M5','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M5">View MathML</a> to be the genome graph augmented with additional edges connecting extremities of the same gene, i.e., <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M5','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M5">View MathML</a> is the graph whose labeled vertices are the gene extremities in G, and whose edge set is <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M6">View MathML</a>.

A chromosome of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a> is the set of all adjacencies and telomeres of gene extremities in a connected component of the augmented genome graph (Figure 1-b). A chromosome is linear (resp. circular) if the corresponding connected component is a path (resp. cycle) (Figure 1-b). Note that an adjacency {gh, gt} represents a circular chromosome with the single gene g. A genome is circular or linear if all of its chromosomes are circular or linear, and we say it is mixed if it has both circular and linear chromosomes. A genome is uni-chromosomal if it has only one chromosome, and it is multi-chromosomal, otherwise. A chromosomal condition is a condition on the number or type of chromosomes in a genome. For example we can describe the structure of a genome by two chromosomal conditions: being (i) uni-chromosomal, and (ii) circular.

As described above a paired-end sequencing experiment provides the adjacencies <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2">View MathML</a> of the sequenced genome relative to the genes from a reference genome. However, our knowledge about a genome's adjacencies is typically incomplete. For a set <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M7">View MathML</a> of chromosomal conditions, a <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M7">View MathML</a> -partial genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M8','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M8">View MathML</a> on n genes is a set of adjacencies <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M9','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M9">View MathML</a> such that there exists a set <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M10','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M10">View MathML</a> of pairs of gene extremities such that <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M11','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M11">View MathML</a> is a genome with chromosomal condition <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M7">View MathML</a>. When <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M7">View MathML</a> is clear in the context we will say partial-genome instead of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M7">View MathML</a> -partial genome. The problems we study below involve adding the missing adjacencies in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M7">View MathML</a> -partial genomes to complete them into genomes with chromosomal condition <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M7">View MathML</a>. Sometimes we have an idea about the number or the structure of chromosomes in a genome. We define a completion of a partial genome relative to these chromosomal conditions. If <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a> is a genome, we say <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M12">View MathML</a> provided <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M13','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M13">View MathML</a>. A <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M14','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M14">View MathML</a> of a partial genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M8','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M8">View MathML</a> is a genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a> with <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M12">View MathML</a> and satisfying the conditions in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M7">View MathML</a>. When <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M7">View MathML</a> is clear in the context, we just say completion instead of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M205','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M205">View MathML</a>.

A multi-genome is a mixture of genomes with the same set of genes. Formally, the multi-genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15">View MathML</a> formed from genomes <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M16','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M16">View MathML</a> is a multiset <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M17','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M17">View MathML</a> obtained from <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M18','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M18">View MathML</a>, the disjoint union of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M19','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M19">View MathML</a> (For a multiset S and an element r, if cS(r) is the number of copies of r in S, the disjoint union of two multisets <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M20">View MathML</a> is a multiset in which each element r appears cA(r) + cB(r) times.). Note that the partition of the adjacencies in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M17','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M17">View MathML</a> into <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M21','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M21">View MathML</a> is not known. There is a corresponding genome graph, a multigraph whose vertices are the gene extremities, and whose edge set is the multiset <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M17','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M17">View MathML</a>. We denote the genome graph of a multi-genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15">View MathML</a> by <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M22','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M22">View MathML</a>.

The genome graph is related to the breakpoint graph in genome rearrangement studies. The breakpoint graph <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M23','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M23">View MathML</a> of the genomes <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M16','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M16">View MathML</a> is an edge-colored multigraph whose labeled vertices are the 2n gene extremities and whose edges are all the adjacencies in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M24','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M24">View MathML</a>, with each edge assigned a color according to its genome of origin. Thus, the only difference between the breakpoint graph and the genome graph is the lack of edge-coloring in the latter, reflecting our inability to measure the origin of each adjacency.

Our knowledge about a multi-genome can be incomplete. For example a tumor is a mixture of different cancer genomes, and during sequencing process, we obtain a mixture of adjacencies from these genomes. We represent the mixtures of adjacencies by a partial multi-genome. A partial multi-genome is a multi-set <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M25','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M25">View MathML</a>, where each <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M26','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M26">View MathML</a> is partial genome. We define the genome graph of a partial multi-genome analogously to a multi-genome.

If k is a positive integer and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15">View MathML</a> is a partial multi-genome, a k-completion of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15">View MathML</a> is a family of k genomes <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M27','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M27">View MathML</a>, such that <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M28','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M28">View MathML</a>. Note that existence of a completion for a partial (multi-) genome is dependent on the structure of the partial (multi-) genome and the chromosomal conditions. Also, the existence of a completion does not imply its uniqueness.

We use a distance function to distinguish between different completions. A distance function on pairs of genomes (with the same set of genes), is a measure of dissimilarity between the genomes. Having selected a pairwise distance function we must define a distance between the k genomes in a mixture. Motivated by the fact that the different cancer genomes in a tumor are obtained by somatic genome rearrangements from a healthy genome, we model the evolution of the cancer genomes by a rooted tree in which all the cancer genomes are descendants of the healthy one. Suppose <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29">View MathML</a> represents a healthy genome, and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M30','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M30">View MathML</a> a mixture of k cancer genomes obtained by rearrangements of the genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29">View MathML</a>. A mixture tree <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M31','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M31">View MathML</a> is a rooted tree on <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M32','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M32">View MathML</a> such that the root vertex is <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29">View MathML</a> and k genomes in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M30','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M30">View MathML</a> are (some of) the vertices in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M31','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M31">View MathML</a>. If ϕ is a distance function on a pair of genomes, then the ϕ-value of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M31','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M31">View MathML</a>, denoted by <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M33','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M33">View MathML</a> is defined as follows:

<a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M34','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M34">View MathML</a>

where E is the set of edges in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M31','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M31">View MathML</a>.

We now define the k-Minimum Completion Problem.

k-Minimum Completion Problem (k-MCP) Given a <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M7">View MathML</a> -partial multi-genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15">View MathML</a>, a positive integer k, a reference genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29">View MathML</a>, and a distance function ϕ, find a k-completion <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M30','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M30">View MathML</a> and a mixture tree <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M31','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M31">View MathML</a> such that <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M33','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M33">View MathML</a> is minimum over all k-completions and mixture trees. If no k-completion exists for <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15">View MathML</a>, we say that this k-MCP does not have a valid solution. We say the k-MCP is unrestricted if <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M35','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M35">View MathML</a>, and is restricted, otherwise.

As written, the k-MCP is a general problem that encompasses many subproblems depending on chromosomal condition set <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M7">View MathML</a> and the distance ϕ. Common distances in genome rearrangement studies include the breakpoint distance [5], the Hannenhalli-Pevzner distance [6] (which generalizes the reversal distance [7]), and the double-cut-and-join (DCJ) distance [4]. Below we will use the DCJ distance, which approximates the other distances [8].

For two genomes <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M36','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M36">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M37','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M37">View MathML</a> on the same set of n genes, their double-cut-and-join (DCJ) distance, denoted by <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M38','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M38">View MathML</a>, is equal to

<a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M39','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M39">View MathML</a>

where <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M40','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M40">View MathML</a> is the number of cycles in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M41','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M41">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M42','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M42">View MathML</a> is the number of paths in B with odd number of vertices [8].

Remark. When at least one of the <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M43','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M43">View MathML</a> are circular we have <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M44','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M44">View MathML</a> and dDCJ (G1, G2) = n - c. Thus, having a larger number of cycles in their breakpoint graph is equivalent to having a smaller distance.

Related work

In comparison to other genome rearrangement problems considered in the literature, the k-MCP has three distinguishing features. (1) The input is a mixture of adjacencies from multiple genomes and the genome of origin of each adjacency is unknown. (2) The set of adjacencies is incomplete: not every adjacency from every genome in the mixture is measured. (3) The ancestral relationships between the genomes in the mixture are unknown, and might include both "ancestral" and "present day" genomes. Some of these features have been considered individually in other work, but to our knowledge no previous work has considered all three together. The first feature bears some resemblance to the genome halving problem [9] of finding the doubled ancestor genome by minimizing a rearrangement distance. This problem and further generalizations to polyploidization [10] involves partitioning (or coloring) adjacencies to minimize a rearrangement distance. However, in general no adjacencies are missing and the distance is pairwise (i.e., no tree) in contrast to the 2-MCP.

Regarding the second feature, several authors have considered the problem of inferring missing adjacencies in a manner that optimizes a genome rearrangement distance. Notably, [11] and [12] consider the problem of computing reversal distance between pairs of partially assembled genomes that are provided as unordered sequences of contigs. These problems were motivated by limitations in DNA sequence technologies that result in most whole-genome assemblies being highly fragmented and comprised of contigs whose relative ordering is unknown. These problems are variations of the 1-MCP, where the reference genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29">View MathML</a> also has missing adjacencies. In particular, [12] orient sets of contigs from two genomes in such a way that the number of cycles in the breakpoint graph of the resulting genomes is maximized, which they note "has been shown to approximate very well the reversal distance between them." However, there is no work on extending this analysis to more than two genomes.

Regarding the third feature, the genome median problem considers the problem of finding an ancestral genome that minimizes the distance between three given genomes [5,13]. This is different from k-MCP in that the three individual genomes are known (rather than mixed) and the genomes are complete with no missing adjacencies. Also, in the median problem the topology of the phylogenetic tree has been already inferred, while in k-MCP we have to find an optimal topology for the phylogenetic tree as well.

Results

In this section we first consider the 1-MCP problem. We present linear time algorithms that solve 1-MCP in the cases where: (i) the measured, incomplete genome has a single circular or linear chromosome; (ii) there are no restrictions on the chromosomal content of the measured, incomplete genome.

Next we prove that the unrestricted k-MCP is NP-complete when k ≥ 3 for any distance function ϕ. Finally, we show that the unrestricted 2-MCP, and the restricted 2-MCP where all chromosomes are circular (i.e., <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M45','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M45">View MathML</a>), are NP-complete for DCJ distance.

1-MCP

Here, we consider the unrestricted 1-MCP and two restricted versions of 1-MCP problem: (1) the chromosomal condition set is {circular, uni-chromosomal}, which we denote by 1-MCPc; (2) the chromosomal condition set is {linear, uni-chromosomal}, which we denote by 1-MCP. We first show that unrestricted version is linearly tractable. Then, we show that we can solve the 1-MCPc in linear time. Finally, we prove a relation between 1-MCPc and, 1-MCPwhich implies that 1-MCPis also solvable in linear time.

Note that 1-MCPis a variation of the Block Ordering Problem (BOP) considered in [12]. In our terminology, the BOP considers two partial genomes, and aims to complete both partial genomes into linear, unichromosomal genomes such that the pairwise distance between the completed genome is minimal. In [12], Gaul and Blanchette provide a linear algorithm for BOP. The algorithm we present for 1-MCPis simpler than the algorithm for the BOP in [12], and our algorithm is obtained from a straightforward algorithm (Algorithm 1 below) which solves 1-MCPc in linear time.

We begin with the unrestricted 1-MCP, where we have the following result.

Theorem 1. The unrestricted 1-MCP with DCJ distance is linearly tractable.

Proof. In 1-MCP we have a single partial genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a> and a reference genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29">View MathML</a> (see Figure 2-a). Since both <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29">View MathML</a> are matchings over the gene extremities, their breakpoint graph <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M46','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M46">View MathML</a> consists of some paths and cycles. Suppose P1, . . ., Pr are all the paths such that the first and their last edges are adjacencies in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29">View MathML</a>. An optimal completion for <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a> can be obtained by adding an edge to <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a> which connects the end points of each Pi, for 1 ≤ i ≤ r (see Figure 3), since we only can add edges between the vertices which are not incident with any edge in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2">View MathML</a>, i.e., the end vertices of Pi's. Note that adding other possible edges just create longer paths in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M47','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M47">View MathML</a>. □

thumbnailFigure 2. Possible mixture trees when k = 1, 2. (a) The only topology in 1-MCP. (b) Branch-tree and (c) path-tree topologies in 2-MCP.

thumbnailFigure 3. The breakpoint graph <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M187','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M187">View MathML</a> with possible edges to be added to the adjacencies of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a>. Breakpoint graph <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M187','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M187">View MathML</a> consisting of paths and cycles. Thick edges are in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M188','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M188">View MathML</a>, and thin edges are in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2">View MathML</a>. Dashed edges are the edges that should be added to <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2">View MathML</a> for the paths whose first and last edges are in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M188','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M188">View MathML</a>.

1-MCPc: circular uni-chromosomal completion

Here we consider 1-MCPc, the restricted 1-MCP for a partial genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a> that we wish to complete to a circular uni-chromosomal genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133">View MathML</a>. We assume that <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a> is not already a circular uni-chromosomal genome. Thus <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a> has a set <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M201','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M201">View MathML</a> of free extremities, i.e., the extremities that are not in any adjacency in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a>. Equivalently, <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M201','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M201">View MathML</a> is the set of vertices of degree 0 in the genome graph <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M4">View MathML</a>. Finding the completion <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133">View MathML</a> corresponds to finding a partition of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M201','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M201">View MathML</a> into pairs of extremities, i.e., into adjacencies. However, this partition cannot be arbitrary as the adjacencies defined by the partition must satisfy two constraints: (1) The resulting genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133">View MathML</a> is circular uni-chromosomal, meaning that the augmented genome graph <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M48','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M48">View MathML</a> has exactly one component, a cycle. Note that <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M5','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M5">View MathML</a> has only path components, since <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M49','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M49">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M50','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M50">View MathML</a>. (2) The resulting genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133">View MathML</a> must minimize the distance between the reference genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133">View MathML</a>.

The first constraint on partitioning of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M201','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M201">View MathML</a> is that joining extremities at ends of a same path in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M5','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M5">View MathML</a> by an edge, which we call an excluded edge, creates a cycle. This cycle must be selected carefully to obtain a uni-chromosomal genome. We define <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M202','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M202">View MathML</a> to be the set of all excluded edges.

The second constraint on partitioning of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M201','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M201">View MathML</a> is provided by our desire to minimize the distance between the reference genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133">View MathML</a>. For the DCJ distance, we must maximize the number <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M51','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M51">View MathML</a> of cycles in the breakpoint graph <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M52','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M52">View MathML</a>. Adding an edge to <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2">View MathML</a> increases the number of cycles in B if and only if the edge connects the endpoints of a same path in B. We call such an edge a desired edge and denote by <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M53','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M53">View MathML</a> the set of all desired edges. Now we combine these two constraints into a graph.

We define the free-extremities graph, <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M54','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M54">View MathML</a> to be a bicolored graph, whose vertex set is <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M201','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M201">View MathML</a>, and whose edge set is <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M55','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M55">View MathML</a>. The edges from <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M56','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M56">View MathML</a> are colored blue and the edges from <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M202','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M202">View MathML</a> are colored red. Note that R is a multi-graph, and R consists of even cycles. This is because both <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M57','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M57">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M202','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M202">View MathML</a> are perfect matchings on <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M201','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M201">View MathML</a>: since both <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M58','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M58">View MathML</a> and {{gh, gt} | g is a gene in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a>} are perfect matchings on the set of all gene extremities. The restriction of these perfect matchings to <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M201','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M201">View MathML</a> are <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M59','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M59">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M202','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M202">View MathML</a>. See Figure 4-b. Thus, we have

thumbnailFigure 4. Adding adjacencies to a partial genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a> to solve the 1-MCPc. (a) The breakpoint graph <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M189','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M189">View MathML</a>. Gray edges indicate adjacencies of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a>, black edges indicate adjacencies of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29">View MathML</a>, and the dotted edges connect extremities of the same gene. The set of free vertices is <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M190','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M190">View MathML</a>. (b) The free-extremities graph <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M191','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M191">View MathML</a> consists of two even cycles. Blue edges are desired edges <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M192','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M192">View MathML</a> and red edges are excluded edges <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M202','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M202">View MathML</a>. (c) The resulting breakpoint graph after adding adjacency {1h, 4h}. (d) The resulting free-extremities graph after update(R, {1h, 4h}). The vertices 1h and 4h are no longer free extremities and thus are removed during update(R, {1h, 4h}).

<a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M60','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M60">View MathML</a>

(1)

To find a completion of the partial genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a> we select pairs {u, v} of free extremities from <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M201','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M201">View MathML</a> and add them as adjacencies to <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2">View MathML</a>. Respecting the constraints encoded in the free-extremities graph R, we define a transformation update(R, {u, v}) that records the effect of adding adjacency {u, v} to <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a> (Figure 4). In particular, since u and v are free vertices of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a>, there are paths <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M61','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M61">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M62','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M62">View MathML</a> in B with an endpoint equal to u and v, respectively. Similarly, there are paths <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M63','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M63">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M64','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M64">View MathML</a> in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M65','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M65">View MathML</a> having an endpoint equal to u and v, respectively. We may have <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M66','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M66">View MathML</a> or <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M67','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M67">View MathML</a>. By the definition of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M68','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M68">View MathML</a>, <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M61','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M61">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M62','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M62">View MathML</a> are represented by blue edges bu and bv in R incident to u and v. Similarly by the definition of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M202','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M202">View MathML</a>, <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M69','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M69">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M70','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M70">View MathML</a> are represented by red edges ru and rv in R incident to u and v. Adding the adjacency {u, v} to <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2">View MathML</a> will have the following effects on B and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M5','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M5">View MathML</a>:

(i) u and v are no longer free vertices.

(ii) If <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M71','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M71">View MathML</a> then these paths merge into one path in B ∪ {u, v}. Otherwise these paths merge to create a cycle in B ∪{u, v}, and the number of cycles in the breakpoint graph increases by one.

(iii) If <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M72','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M72">View MathML</a> these paths merge into one path in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M73','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M73">View MathML</a>. Otherwise these paths merge into a cycle in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M74','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M74">View MathML</a>. In the latter case, we should add {u, v} as an adjacency if and only if <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M75','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M75">View MathML</a>. This is because adding {u, v} creates a cycle component in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M76','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M76">View MathML</a> (i.e., a circular chromosome) and if there are other free vertices any subsequent completion will not be uni-chromosomal.

Therefore, adding the adjacency {u, v} to <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2">View MathML</a> will have three corresponding effects on R: removing the vertices u and v from R based on (i) above, identifying bu and bv based on (ii) above, and identifying ru and rv based on (iii) above. We denote this process of updating R by update(R, {u, v}). Figure 4 gives an illustration of this process.

If {u, v} is a blue edge in R, then update(R, {u, v}) increases the number of cycles in the breakpoint graph B by one. Hence, to find a solution to 1-MCPc we want to perform update(R, {u, v}) transformations with as many blue edges as possible. On the other hand, adding new adjacencies has to merge the paths in the graph <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M5','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M5">View MathML</a> in such a way that we end with a genome with exactly one circular chromosome. Let Mb(R) be the maximum possible number of update transformations using blue edges for the graph R. The following theorem provides the exact value of Mb(R).

Theorem 2. Suppose <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a> is a partial genome, <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29">View MathML</a> is a reference genome, and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M77','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M77">View MathML</a> is their free-extremities graph. We have

<a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M78','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M78">View MathML</a>

where Nb(R) is the number of blue edges, and c(R) is the number of cycles in R.

Proof. We prove the theorem by induction on Nb(R). Suppose Nb(R) = 1. Then necessarily R consists of a cycle of length 2 with one blue and one red edge, and c(R) = 1. Thus, we update the graph R with the unique (and the only possible) blue edge obtaining

<a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M79','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M79">View MathML</a>

Now suppose Nb(R) >1. Then <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M80','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M80">View MathML</a>, since <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M81','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M81">View MathML</a>. Suppose u, <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M82','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M82">View MathML</a>, and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M83','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M83">View MathML</a>, i.e., there is no red edge between u and v in R. Then, we have the following three cases for u and v: (i) u and v are from different cycles Cu and Cv in R, (ii) u and v are connected with a blue edge in a cycle C of R, or (iii) u and v are non-neighboring vertices in a cycle C of R.

Let R' = update(R, {u, v}) be the free-extremities graph after the update. Since u and v are incident with blue edges in R, after update(R, {u, v}) the number of blue edges decreases by one, i.e., Nb(R') = Nb(R) - 1.

Thus, by induction hypothesis

<a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M84','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M84">View MathML</a>

(2)

Considering the above cases we have:

(i) After update(R, {u, v}), Cu and Cv will shrink into one cycle, and c(R') = c(R) - 1. Thus by (2), Mb(R') = Nb(R) - c(R) + 1. By choosing such an edge we can update R with Nb(R) - c(R) + 1 blue edges.

(ii) After update(R, {u, v}), C shrinks into a smaller cycle, and c(R') = c(R). Thus, by (2), Mb(R') = Nb(R) - c(R). Since {u, v} is a blue edge, we can update R with Nb(R) - c(R) + 1 blue edges.

(iii) After update(R, {u, v}), C splits into two smaller cycles. Thus c(R') = c(R) + 1. Thus, by (2), Mb(R') = Nb(R) - c(R) - 1. So by choosing {u, v} we can update R with Nb(R) - c(R) - 1 blue edges.

By calculations above, choosing a pair {u, v} satisfying cases (i) or (ii) will result in a greater number of update moves with blue edges, than choosing a pair satisfies the case (iii). Moreover, considering pairs {u, v} from cases (i) and (ii) gives Mb(R) = Nb(R) - c(R) + 1. □

We call a pair {u, v} (which may or may not be an edge in R) satisfying case (i) or (ii) in the proof of Theorem 2 an optimal adjacency. Optimal adjacencies play an important role in finding a solution of 1-MCPc: updating the free-extremities graph with these adjacencies results in the maximum number of blue edges used in update transformations. We have the following important corollary to this theorem.

Corollary 1. Suppose <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a> is a partial genome and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29">View MathML</a> is a reference genome. Adding any optimal adjacency to <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2">View MathML</a> leads to a solution for 1-MCPc. In other words, for any optimal adjacency e, there exists a solution <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133">View MathML</a> for 1-MCPc which includes e as an adjacency.

Proof. By Theorem 2, adding any optimal adjacency to <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2">View MathML</a> will allow the maximum number of blue edges in the update process. Since each update transformation on a blue edge increases the number of cycles in the breakpoint graph by one, a sequence of update transformations on optimal adjacencies gives a solution <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133">View MathML</a> to 1-MCPc. Hence, if <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133">View MathML</a> is the resulting completion of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a>, we obtain the maximum number of cycles in the breakpoint graph <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M85','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M85">View MathML</a>. □

A linear time (in number of genes) algorithm for solving 1-MCPc adds optimal adjacencies according to cases (i) and (ii) in Theorem 2, and is shown in Algorithm 1. The following corollary is an immediate consequence of Corollary 1 and Algorithm 1.

Corollary 2. The 1-MCPc is solvable in linear time.

Algorithm 1: Solving 1-MCPc

Input : Partial genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a> and reference genome A.

Output: A 1-completion <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133">View MathML</a> that is circular uni-chromosomal and maximizes <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M86','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M86">View MathML</a>.

1  begin

2    Construct the free-extremities graph <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M87','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M87">View MathML</a>;

3    <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M88','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M88">View MathML</a>;

4    while c(R) >1 do

5        u, v ← select two vertices from different cycles in R;

6        <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M89','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M89">View MathML</a>;

7        R ← update (R, {u, v});

8    while the number of blue edges in R >1 do

9        u, v ← select two vertices connected via a blue edge in R;

10        <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M89','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M89">View MathML</a>;

11        R ← update (R, {u, v});

12    Add the single remaining excluded edge in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M202','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M202">View MathML</a> to <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M90','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M90">View MathML</a>;

13    Output the resulting circular uni-chromosomal genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133">View MathML</a>;

14  end

1-MCP: linear uni-chromosomal completion

In this section we consider the 1-MCP with chromosomal condition of a linear uni-chromosomal genome. We refer to this restricted problem as 1-MCP. We relate solutions of 1-MCPto solutions of 1-MCPc. Combined with the results in the previous section, we derive a linear time algorithm for 1-MCP.

Recall that <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M91','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M91">View MathML</a> is the number of alternating cycles in the breakpoint graph <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M92','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M92">View MathML</a>, for any solution <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133">View MathML</a> of 1-MCPc. Similarly, we define <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M93','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M93">View MathML</a> to be the number of alternating cycles in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M94','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M94">View MathML</a>, for any solution <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M95','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M95">View MathML</a> of 1-MCP. The following theorem relates the solutions of 1-MCPc to the solutions of 1-MCP.

Theorem 3. Let <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a> be a partial genome, <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M96','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M96">View MathML</a> be a circular uni-chromosomal genome, and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M97','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M97">View MathML</a> be a linear uni-chromosomal genome obtained from <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M96','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M96">View MathML</a> by removing an adjacency e. Suppose <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M96','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M96">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M97','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M97">View MathML</a> are the reference genomes in 1-MCPc and 1-MCP, respectively. From any solution <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133">View MathML</a> to 1-MCPc we obtain a solution <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M98','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M98">View MathML</a> for 1-MCP. Also, from any solution <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M99','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M99">View MathML</a> to 1-MCPwe obtain a solution <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M100','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M100">View MathML</a> for 1-MCPc. Moreover, <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M101','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M101">View MathML</a>, where

<a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M102','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M102">View MathML</a>

Proof. First, suppose e is not in any cycle in the graph <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M103','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M103">View MathML</a>, and hence θ(e) = 1. Let <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133">View MathML</a> be a solution to 1-MCPc, and let <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M104','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M104">View MathML</a> be a linear uni-chromosomal genome obtained from <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133">View MathML</a> by removing an adjacency <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M105','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M105">View MathML</a>, such that f and e are in the same cycle in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M106','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M106">View MathML</a>. Note that such edge f exists, since e is not in any cycle in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M107','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M107">View MathML</a> but it is in a cycle of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M108','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M108">View MathML</a>. See Figure 5. Both gr<a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M109','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M109">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M110','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M110">View MathML</a> are perfect matchings as <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M96','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M96">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133">View MathML</a> are both circular. Removing the edges e and f from <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M111','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M111">View MathML</a> will decrease the number of cycles by exactly one since e and f are in a same cycle in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M112','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M112">View MathML</a>. Hence <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M113','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M113">View MathML</a>, and we have,

thumbnailFigure 5. Relating 1-MCPc and 1-MCP. (a) The breakpoint graph <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M193','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M193">View MathML</a>; black edges are <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M194','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M194">View MathML</a> and and gray edges are <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M2">View MathML</a>. The edge e = {1t, 6h} is the only edge in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M195','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M195">View MathML</a>. Since e is not in a cycle component of B, we have θ(e) = 1. (b) The breakpoint graph <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M196','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M196">View MathML</a>, where <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133">View MathML</a> is a completion of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a> and a solution to 1-MCPc. The adjacency f is in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M197','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M197">View MathML</a> and shown by a gray dashed edge. B' has two cycles, and removing e and f decreases the number of cycles by one.

<a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M114','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M114">View MathML</a>

(3)

where the last inequality follows from the definition of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M115','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M115">View MathML</a> as the largest number of cycles in any linear chromosomal completion of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a>.

Now suppose <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M116','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M116">View MathML</a> is a solution to 1-MCP, so <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M117','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M117">View MathML</a>. Assume <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M118','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M118">View MathML</a>. Let <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M119','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M119">View MathML</a> be the circular uni-chromosomal genome obtained by adding <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M120','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M120">View MathML</a> to <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M121','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M121">View MathML</a>. Note that there is at least one path component in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M122','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M122">View MathML</a> which becomes a cycle after adding the edges f' to <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M123','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M123">View MathML</a> and e to <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M124','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M124">View MathML</a>. Hence, <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M125','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M125">View MathML</a>, and we have

<a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M126','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M126">View MathML</a>

(4)

Thus by (3) and (4) we have <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M127','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M127">View MathML</a>, which implies that <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M128','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M128">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M129','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M129">View MathML</a>. This means that <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M130','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M130">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M131','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M131">View MathML</a> are solutions to 1-MCPc and 1-MCPthat are obtained from <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M132','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M132">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133">View MathML</a>, respectively, which completes the proof for the case θ(e) = 1.

Now suppose e is in a cycle in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M134','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M134">View MathML</a>, and thus θ(e) = 2. Using the same argument above, we have <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M135','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M135">View MathML</a> since we cannot find such edge f and the number of cycles in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M136','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M136">View MathML</a> decreases by two, when we remove an edge from <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133">View MathML</a> (to obtain a linear genome), and e from <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M96','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M96">View MathML</a> (to obtain the genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M137','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M137">View MathML</a>). Also, <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M138','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M138">View MathML</a>, as adding the excluded edges of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M97','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M97">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M116','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M116">View MathML</a> will increase the number of cycles by 2. Thus, for this case we have <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M139','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M139">View MathML</a>

Notice that the function θ depends only on the partial genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a> and the reference genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M96','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M96">View MathML</a>, and not on the completion <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133">View MathML</a>. Also, it is easy to see that θ is computable in linear time (in number of genes). We have the following corollary.

Corollary 3. The 1-MCPis solvable in linear time.

Proof. Suppose <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a> is a partial genome and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M97','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M97">View MathML</a> is a linear chromosomal reference genome. Since <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M97','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M97">View MathML</a> is linear and uni-chromosomal, <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M140','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M140">View MathML</a>. Assume that <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M141','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M141">View MathML</a>. Let <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M96','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M96">View MathML</a> be the circular uni-chromosomal genome obtained by adding e to <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M142','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M142">View MathML</a>. Using Algorithm 1 we obtain a solution <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133">View MathML</a> for 1-MCPc with <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M96','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M96">View MathML</a> as the reference genome. Then by Theorem 3, we can transform the solution <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133">View MathML</a> to a linear uni-chromosomal completion <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M99','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M99">View MathML</a> in linear time in the following way: If there exists an edge <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M143','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M143">View MathML</a> such that f and e are in the same cycle of the breakpoint graph <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M144','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M144">View MathML</a>, i.e. θ(e) = 1, remove f from <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M145','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M145">View MathML</a>. Otherwise θ(e) = 2 and we remove an arbitrary edge from <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M146','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M146">View MathML</a> to make a linear uni-chromosomal genome. Therefore, we obtain a solution to 1-MCPby viewing <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M1">View MathML</a> as a partial genome for a 1-MCPc, solving the problem, and converting the solution <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M133">View MathML</a> of 1-MCPc into a solution <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M99','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M99">View MathML</a> for 1-MCP. Since all of these steps are done in linear time (in number of genes), the proof is complete. □

(3 ≤ k)-MCP

In the unrestricted case of the k-MCP, the completion of a partial genome is always possible as we can add adjacencies and telomeres arbitrarily to the partial genome, since there is no restriction on the number and type of chromosomes in the resulting genome. The hardness of showing the existence of a k-completion derives from the fact that finding a k-completion for the partial multi-genome results in a proper edge coloring for the genome graph of the partial multi-genome.

Let G = (V, E) be a graph. We define the edge-chromatic number of G, denoted χ'(G), to be the minimum number of colors required to obtain an edge-coloring of G. For each edge-coloring of G a color class is a set of all edges with a specific color. A color class defines a matching in the graph since no two edges of the same color share a vertex.

The following proposition shows the relation between the edge-coloring of a genome graph and the edge color classes of the corresponding breakpoint graph.

Proposition 1. If <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15">View MathML</a> is a multi-genome of k genomes then <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M147','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M147">View MathML</a>.

Proof. Suppose <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15">View MathML</a> is a mixture of k genomes <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M148','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M148">View MathML</a>. Then the breakpoint graph <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M149','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M149">View MathML</a> can be partitioned into the sets <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M150','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M150">View MathML</a> of adjacencies, and each <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M150','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M150">View MathML</a> can be considered as color class. So the edges of B can be colored with k colors. Since B and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M22','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M22">View MathML</a> are isomorphic, we have <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M151','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M151">View MathML</a>. □

Using the same argument as in Proposition 1 we have:

Lemma 1. If <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15">View MathML</a> is a partial multi-genome of k partial genomes then <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M151','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M151">View MathML</a>.

Now, in the following theorem we show a relation between the edge-coloring of a genome graph and the k-completion of the corresponding partial multi-genomes.

Theorem 4. Let <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15">View MathML</a> be a partial multi-genome. Then <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15">View MathML</a> has an unrestricted k-completion if and only if <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M151','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M151">View MathML</a>, for any positive integer k.

Proof. (⇒) If <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15">View MathML</a> has a k completion, then it can be considered as a partial multi-genome of k genomes. Then by Lemma 1 we have <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M151','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M151">View MathML</a>.

(⇐) Now assume that <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M151','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M151">View MathML</a>. Hence, we can color the edges of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M22','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M22">View MathML</a> with k colors. If C1, . . ., Ck are the color classes of G, we have <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M152','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M152">View MathML</a>. Each Ci is a matching in the graph <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M22','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M22">View MathML</a>, and is a set of adjacencies among the gene extremities. So we can define a partial genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M153','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M153">View MathML</a> by adjacencies <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M154','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M154">View MathML</a>. The color classes partition the edges of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M22','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M22">View MathML</a> into k matchings, and we have <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M155','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M155">View MathML</a>. Since there is no restriction on the completions, taking any completion <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M156','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M156">View MathML</a> for each <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M157','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M157">View MathML</a> results in a a k-completion <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M158','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M158">View MathML</a> for <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15">View MathML</a>; because <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M159','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M159">View MathML</a>. □

Now, by Theorem 4 and using the following two classic theorems, we show that deciding whether there exists a valid solution to a (k ≥ 3)-MCP is NP-complete. For a graph G let Δ(G) be the maximum degree of G.

Theorem 5 (Vizing [14]). If G is a simple graph, χ'(G) = Δ(G) or Δ(G) + 1.

Theorem 6 (Holyler [15]). For a graph G, deciding whether χ'(G) = Δ(G) or Δ(G) + 1 is NP-complete, if Δ(G) 3.

Corollary 4. If k ≥ 3, deciding whether there exists a valid solution to the unrestricted k-MCP is NP-complete.

Proof. In order to prove this corollary we reduce the edge-coloring problem to k-MCP. Suppose G = (V, E) is a simple graph and k = Δ(G) 3. If |V | is not even, add an isolated vertex so that the number of vertices in G is 2n for some positive integer n. Consider these 2n vertices as gene extremities of a set of n genes. Now, G defines a partial multi-genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15">View MathML</a> on these n genes, since the k-MCP is unrestricted and any graph can be considered as a partial multi-genome with no restriction on the chromosomal structure of its partial genomes. If there is a polynomial algorithm for k-MCP, we can input to this algorithm <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15">View MathML</a> as the partial multi-genome, along with an arbitrary distance function ϕ and a healthy reference <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29">View MathML</a>. First, suppose the algorithm gives a valid output. Since the algorithm is polynomial, we can find a k-completion for <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15">View MathML</a> in polynomial time, and by Theorem 4, we can find an edge coloring of G with k colors in polynomial time.This implies that the χ'(G) ≤ k. Now if the algorithm does not give a valid output, by Theorem 4 we have χ'(G) > k. This implies that the k-MCP is NP-complete, since the genome graph of a partial multi-genome is always a multigraph and the class of simple graphs is a subset of the class of multigraphs. □

Note that in Corollary 4 we only considered the unrestricted version of k-MCP. This allows us to assume that for each (multi-) graph G there exists a partial multi-genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15">View MathML</a> such that G and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M22','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M22">View MathML</a> are isomorphic.Thus, if <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M160','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M160">View MathML</a> | for all partial multi-genomes <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M161','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M161">View MathML</a>} and if <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M162','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M162">View MathML</a> is the set of all multi-graphs, then <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M163','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M163">View MathML</a>. However, one can restrict the k-MCP by taking a set of chromosomal conditions. Consequently we may have <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M204','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M204">View MathML</a> such that the new restricted k-MCP is polynomially tractable for all partial multi-genomes (whose genome graph is in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M164','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M164">View MathML</a>).

Corollary 5. If k ≥ 3, then the unrestricted k-MCP is NP-complete.

Proof. Since in solving a k-MCP we need to find a k-completion for its partial multi-genome, by Corollary 4 the proof is complete. □

2-MCP

In this section, we prove that the unrestricted 2-MCP, and the restricted 2-MCP where all chromosomes are circular (i.e., <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M165','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M165">View MathML</a>), are NP-complete for DCJ distance. The NP-completeness of the unrestricted 2-MCP is done by a reduction from MAX 3-AND problem. The MAX 3-AND is a satisfiability problem, where given a set of conjunctions, each with 3 literals, the goal is to determine an assignment of Boolean value to each variable that maximizes the number of satisfied conjunctions. Note that in 2-MCP there are only two possible topologies for the mixture tree: the branch-tree and path-tree (Figure 2-b, c).

Theorem 7. The unrestricted 2-MCP with DCJ distance is NP-complete.

In order to provide the proof of this theorem, we need the following lemmas.

Lemma 2. Suppose <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M15">View MathML</a> is a partial multi-genome whose genome graph, <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M22','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M22">View MathML</a>, consists of m cycles C1, . . ., Cm with even lengths, and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29">View MathML</a> is a reference genome which consists of ℓ edges (i.e., it has ℓ adjacencies). Assume that there are ℓ' cycles among the cycles in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M22','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M22">View MathML</a> such that no edge in A is connected to any of their vertices. If ℓ' >2ℓ then in every solution to the 2-MCP, the optimal mixture tree is a path-tree.

Proof. Note that in 2-MCP there are only two possible topologies for the mixture tree: the branch-tree and path-tree (Figure 2-b, c). Since the degree of each vertex in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M22','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M22">View MathML</a> is two, if we partition the edges of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M22','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M22">View MathML</a> into two perfect matchings <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M166','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M166">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M167','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M167">View MathML</a>. Therefore, for any 2-completion <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M168','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M168">View MathML</a> we have <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M169','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M169">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M170','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M170">View MathML</a>, since G1 and G2 are maximal (and circular) and we cannot add any edge to them. Also, for each <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M171','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M171">View MathML</a> we have <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M172','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M172">View MathML</a>, where Mij is a perfect matching on vertices of Cj. Obviously, <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M173','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M173">View MathML</a>. We have <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M174','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M174">View MathML</a> for i = 1, 2, since <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29">View MathML</a> has ℓ edges and each of them can be in at most one cycle in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M175','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M175">View MathML</a>. Therefore,

<a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M176','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M176">View MathML</a>

which shows that the dDCJ-value of a path tree is smaller than the dDCJ -value of a branch tree, and completes the proof. □

Lemma 3. Any MAX 3-SAT instance is reducible to a MAX 3-AND instance. Moreover, MAX 3-AND is NP-complete.

Proof. Let Δ = ℓ1 V ℓ2 V ℓ3 be a clause (disjunction) of three literals. Define

<a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M177','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M177">View MathML</a>

By using basic Boolean rules we have Δ ⇔ VS∈ℓ(Δ) S.

Now, suppose <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M178','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M178">View MathML</a> is a MAX 3-SAT instance which has m clauses Δ1, . . ., Δm. Let <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M179','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M179">View MathML</a> be an instance of MAX 3-AND which consists of all the conjunctions in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M180','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M180">View MathML</a> Since for every assignment to the variables at most one conjunction in Lj), 1 ≤ j ≤ m, is satisfied and this happens if and only if Δj is satisfied, then every optimal assignment to the variables in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M178','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M178">View MathML</a> will be also an optimal assignment to the variables in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M178','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M178">View MathML</a>. Therefore, MAX 3-SAT is reducible to MAX 3-AND, which implies that MAX 3-AND is NP-complete, as MAX 3-SAT is NP-complete [16]. □

Now, consider an instance <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M178','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M178">View MathML</a> of the MAX 3-AND problem. We show how to represent <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M178','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M178">View MathML</a> by a genome graph and a reference genome, to make a reduction from MAX 3-AND to 2-MCP. Suppose we represent a variable x with a cycle C of even length, which we will call a variable-cycle (see Figure 6-a). This cycle has exactly two perfect matchings. We label one of these the true matching, T(x), and the other one the false matching, F(x) (see Figure 6-b, c). We represent an assignment to a variable by choosing one of the matchings T(x) and F(x) and remove the edges in the other matching (see Figure 7).

thumbnailFigure 6. Representing variables with cycles. (a) A variable represented by a cycle, (b) a true matching, and (b) a false matching.

thumbnailFigure 7. Representing conjunctions with cycles. (a) Three cycles representing the literals <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M198','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M198">View MathML</a>, <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M199','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M199">View MathML</a>, and z, and the conjunction edges (bold) for a conjunction <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M200','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M200">View MathML</a>. (b) For x = y = false and z = true we obtain the conjunction-cycle Δ of length 6. (c) Any other assignment (e.g., x = true) destroys the conjunction cycle.

Let ℓ(x1), ℓ(x2), ℓ(x3) be three literals of variables x1, x2, x3, and Δ = (ℓ(x1) Λ ℓ(x2) Λ ℓ(x3)) be a conjunction in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M178','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M178">View MathML</a>. A conjunction-cycle of Δ is a cycle which is obtained as follows:

1. For each i ∈ {1, 2, 3} consider an edge in T(xi) if ℓ(xi) = xi. If <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M181','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M181">View MathML</a> take an edge in F(xi).

2. Add three new edges, called conjunction-edges, to the three edges we chose in the previous step, and build a cycle of length 6. This cycle is a conjunction-cycle of Δ.

It is easy to see that an assignment α to xi's satisfy the conjunction Δ if and only if the corresponding matching assignment to α keeps all the edges in the conjunction-cycle of Δ. We call a representation of a MAX 3-AND instance <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M178','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M178">View MathML</a> with cycles and conjunction-cycles a graphical representation of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M178','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M178">View MathML</a>.

If the literals of a variable appear in at most t conjunctions, and the variable-cycles have length at least 4t, then by choosing the edges of conjunction-cycles properly, we have a graphical representation of a MAX 3-AND instance, where no edge in a variable-cycle is incident with two conjunction edges from different conjunction-cycles. This implies the following lemma:

Lemma 4. For each MAX 3-AND instance <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M178','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M178">View MathML</a> there exists a graphical representation <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M203','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M203">View MathML</a> such that any as-signments to the variables in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M178','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M178">View MathML</a> which maximizes the number of satisfied conjunctions, induces a matching assignment that maximizes the number of conjunction-cycles, and vice versa.

Combining Lemmas 2-4 gives the proof of Theorem 7.

Proof of Theorem 7. Since the MAX 3-AND is NP-complete by Lemma 3, it suffices to reduce the MAX 3-AND problem to the 2-MCP. Suppose <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M178','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M178">View MathML</a> is a MAX 3-AND instance. Assume <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M178','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M178">View MathML</a> has m conjunctions. We can add 3m + 1 new conjunctions δ1, . . ., δ3m+1 where each δi consists of a new single variable xδi; obviously in any optimal assignment the value of all the xδi's should be true. Now by Lemma 4, there is a graphical representation <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M203','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M203">View MathML</a> such that finding an optimal assignment in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M178','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M178">View MathML</a> is equivalent to finding a matching for each variable-cycle such that the number of preserved conjunction-cycles are maximized. Note that there are 3m conjunction-edges and 3m + 1 variable-cycles which are not connected to any conjunction-edge. Now, consider all the vertices in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M203','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M203">View MathML</a> as gene extremities, and all the edges in the variable-cycles as the adjacencies of a partial multi-genome G. Also, consider all the conjunction-edges as the adjacencies of a reference healthy genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29">View MathML</a>. In the 2-MCP problem with partial multi-genome G and reference healthy genome <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M29">View MathML</a>, the optimal tree is forced to be a path-tree by Lemma 2 (Figure 2). Therefore, in the optimal solution of this 2-MCP, <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M182','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M182">View MathML</a> should be a genome such that the number of cycles in the breakpoint graph <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M183','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M183">View MathML</a> is maximized, i.e., the number of conjunction-cycles are maximized. Since <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M182','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M182">View MathML</a> is a union of perfect matchings of the variable-cycles (see the proof of Lemma 2) it induces an assignment for the variables which maximizes the number satisfied conjunctions, and this completes the proof. □

We end this section by considering the restricted version of k-MCP, where the chromosomal condition set is {circular}, i.e. all genomes have all circular chromosomes. We denote this restricted version by k-MCPc, and the unrestricted version of k-MCP by k-MCP. If opt(k-MCPc) and opt(k-MCP) are the dDCJ-value of a solution to k-MCPc and k-MCP, respectively, then:

Theorem 8. For the k-MCPc and k-MCPversions of k-MCP with DCJ distance we have

<a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M184','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M184">View MathML</a>

Proof. First note that each solution to k-MCPc is also a solution of k-MCP, since there is no restriction in k-MCP. Hence, opt(k-MCPc) opt(k-MCP). Second, for each solution to k-MCPif the resulting genomes are not circular we can add new edges to the genomes and make them circular. By adding the new edges the number of cycles in the breakpoint graph does not decrease which implies that the dDCJ-value of the newly obtained genomes is not larger than opt(k-MCP). Therefore, these circular genomes form a solution of k-MCP. So opt(k-MCPc) opt(k-MCP) completing the proof. □

Combining this theorem and Theorem 7 we have

Corollary 6. If k ≥ 2, then k-MCPc with DCJ distance is NP-complete.

Discussion and conclusion

In this paper we introduced the k-Minimum Completion Problem (k-MCP) motivated by the type of data produced in current cancer genome sequencing studies. We showed the following results. (1) A linear time algorithm for the unrestricted 1-MCP; (2) a linear time algorithm for the restricted versions 1-MCP where the genomes are circular or linear; i.e. the chromosomal condition set <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M7">View MathML</a> is {circular, uni-chromosomal} or <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M7">View MathML</a> is {linear, uni-chromosomal}; (3) the unrestricted k-MCP is NP-complete for any distance when k ≥ 3; and (4) the 2-MCP with DCJ distance is NP-complete in the unrestricted version and with the condition that all chromosomes are circular, i.e. <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M185','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S9/mathml/M185">View MathML</a>. These results lay the foundation for future algorithmic studies of the k-MCP and the application of these algorithms to real cancer sequencing data.

There are numerous further directions to pursue. As noted in the introduction, the model described in this paper does not consider all the complexities of cancer genome sequencing: most importantly copy number aberrations (duplications and deletions) and errors in the measured adjacencies are important features of cancer genome sequencing and should be addressed.

To handle errors, one might consider weighted versions of the k-MCP where adjacencies have a weight corresponding to the confidence in the measurement. Regarding the current model, further work is needed on different chromosomal conditions, genomic distances, or other constraints on the relationships between the genomes in the mixture. For example, the case of linear chromosomes demands further attention, as human chromosomes are linear, although circular chromosomes have been observed in cancer [17]. Similarly, one may impose an upper bound on the number of chromosomes. One may also place restrictions on the structure of the mixture tree.

Another direction is to derive approximation algorithms. In the k-MCP we aim to minimize distance over all possible k-completion and mixture trees simultaneously. However, by separating the completion and distance optimization steps, one may employ techniques that have developed for other problems. For example, one may try to first complete the partial multi-genomes using some clustering techniques, as have been employed in metagenomic studies [18]. With complete genomes, one could then try to find optimal mixture trees rooted at the reference genome. Depending on the allowed structure of the mixture tree, techniques from genome rearrangement phylogeny problems may be employed. For example, in the case of 2-MCP, if the complete genomes are the leaves of the mixture tree, then the problem becomes the median problem (with the reference genome genome as the third genome) [5,13]. Alternatively, if the genomes are the vertices of the mixture tree, then the tree construction problem becomes the problem of finding a minimum spanning tree, which is in generally easier. In between these extremes, where some of the genomes in the mixture are the leaves and some are intermediate nodes (ancestors), the problem becomes a Steiner tree problem. In the cancer application, any of these cases might provide useful approximations, as the process of clonal evolution of cancer [1] might mean that cells at intermediate stages of cancer progression remain present in the tumor.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

All authors contributed equally to this work.

Acknowledgements

We thank the anonymous referees for helpful comments on an earlier version of this manuscript. This work was supported by a CAREER Award from the National Science Foundation (#1053753). In addition, BJR is supported by a Career Award from the Scientific Interface from the Burroughs Wellcome Fund and an Alfred P. Sloan Research Fellowship.

This article has been published as part of BMC Bioinformatics Volume 13 Supplement 19, 2012: Proceedings of the Tenth Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/13/S19

References

  1. Nowell PC: The clonal evolution of tumor cell populations.

    Science 1976, 194(4260):23-28. PubMed Abstract | Publisher Full Text OpenURL

  2. Raphael BJ, Volik S, Collins C, Pevzner PA: Reconstructing tumor genome architectures.

    Bioinformatics 2003, 19(Suppl 2):i162-171. PubMed Abstract | Publisher Full Text OpenURL

  3. Meyerson M, Gabriel S, Getz G: Advances in understanding cancer genomes through second-generation sequencing.

    Nat Rev Genet 2010, 11(10):685-696. PubMed Abstract | Publisher Full Text OpenURL

  4. Yancopoulos S, Attie O, Friedberg R: Efficient sorting of genomic permutations by translocation, in-version and block interchange.

    Bioinformatics 2005, 21(16):3340-3346. PubMed Abstract | Publisher Full Text OpenURL

  5. Tannier E, Zheng C, Sankoff D: Multichromosomal median and halving problems under different genomic distances.

    BMC Bioinformatics 2009., 10 PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  6. Hannenhalli S, Pevzner PA: Transforming Men into Mice (Polynomial Algorithm for Genomic Distance Problem). In FOCS. IEEE Computer Society; 1995:581-592. OpenURL

  7. Hannenhalli S, Pevzner PA: Transforming Cabbage into Turnip: Polynomial Algorithm for Sorting Signed Permutations by Reversals.

    J ACM 1999, 46:1-27. Publisher Full Text OpenURL

  8. Bergeron A, Mixtacki J, Stoye J: A new linear time algorithm to compute the genomic distance via the double cut and join distance.

    Theor Comput Sci 2009, 410(51):5300-5316. Publisher Full Text OpenURL

  9. El-Mabrouk N, Sankoff D: The Reconstruction of Doubled Genomes.

    SIAM J Comput 2003, 32(3):754-792. Publisher Full Text OpenURL

  10. Warren R, Sankoff D: Genome Aliquoting Revisited. In RECOMB-CG, Volume 6398 of Lecture Notes in Computer Science. Edited by Tannier E. Springer; 2010:1-12. PubMed Abstract | Publisher Full Text OpenURL

  11. Zheng C, Lenert A, Sankoff D: Reversal distance for partially ordered genomes.

    ISMB (Supplement of Bioinformatics) 2005, 502-508. PubMed Abstract | Publisher Full Text OpenURL

  12. Gaul É, Blanchette M: Ordering Partially Assembled Genomes Using Gene Arrangements. In Comparative Genomics, Volume 4205 of Lecture Notes in Computer Science. Edited by Bourque G, El-Mabrouk N. Springer; 2006:113-128. OpenURL

  13. Xu AW: A Fast and Exact Algorithm for the Median of Three Problem-A Graph Decomposition Approach. In RECOMB-CG, Volume 5267 of Lecture Notes in Computer Science. Edited by Nelson CE, Vialette S. Springer; 2008:184-197. OpenURL

  14. Vizing VG: On an estimate of the chromatic class of a p-graph. (Russian).

    Diskret Analiz 1964, 3:25-30. OpenURL

  15. Holyer I: The NP-Completeness of Edge-Coloring.

    SIAM J Comput 1981, 10(4):718-720. Publisher Full Text OpenURL

  16. Cook SA: The Complexity of Theorem-Proving Procedures. In STOC. Edited by Harrison MA, Banerji RB, Ullman JD. ACM; 1971:151-158. OpenURL

  17. Raphael BJ, Pevzner PA: Reconstructing tumor amplisomes.

    ISMB/ECCB (Supplement of Bioinformat-ics) 2004, 265-273. PubMed Abstract | Publisher Full Text OpenURL

  18. Wu YW, Ye Y: A Novel Abundance-Based Algorithm for Binning Metagenomic Sequences Using ℓ-Tuples. In RECOMB, Volume 6044 of Lecture Notes in Computer Science. Edited by Berger B. Springer; 2010:535-549. OpenURL