Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Proceedings of the Third Annual RECOMB Satellite Workshop on Massively Parallel Sequencing (RECOMB-seq 2013)

Open Access Proceedings

De Bruijn Superwalk with Multiplicities Problem is NP-hard

Evgeny Kapun and Fedor Tsarev*

Author Affiliations

St. Petersburg National Research University of Information Technologies, Mechanics and Optics Genome Assembly Algorithms Laboratory 197101, Kronverksky pr., 49, St. Petersburg, Russia

For all author emails, please log on.

BMC Bioinformatics 2013, 14(Suppl 5):S7  doi:10.1186/1471-2105-14-S5-S7


The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2105/14/S5/S7


Published:10 April 2013

© 2013 Kapun and Tsarev; licensee BioMed Central Ltd.

This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

De Bruijn Superwalk with Multiplicities Problem is the problem of finding a walk in the de Bruijn graph containing several walks as subwalks and passing through each edge the exactly predefined number of times (equal to the multiplicity of this edge). This problem has been stated in the talk by Paul Medvedev and Michael Brudno on the first RECOMB Satellite Conference on Open Problems in Algorithmic Biology in August 2012. In this paper we show that this problem is NP-hard. Combined with results of previous works it means that all known models for genome assembly are NP-hard.

Introduction

The majority of current genome sequencing technologies are based on the shotgun method -- the genome is split into several small fragments which are read directly. The problem of reconstructing the initial genome from these small fragments (reads) is known as the genome assembly problem. It is one of the fundamental problems of bioinformatics. Several models for genome assembly were studied by researchers. If reads are assumed to be error-free, the assumption made in all models is that every read from the input must be a substring of the genome.

One of the models is based on maximum parsimony principle -- the original genome should be the shortest string containing all reads as substrings. This leads to the Shortest Common Superstring (SCS) problem which is NP-hard [1]. Modeling genome assembly as the SCS problem has a sufficient drawback: the majority of genomes have repeats -- multiple similar (or even equal) fragments, while the SCS solution would under-represent these repeats.

The de Bruijn graph model proposed in [2] deals with repeats much better. It is based on generating a set of all (k + 1)-character substrings (called (k + 1)-mers) of reads and constructing a de Bruijn graph in which the vertices are k-mers and edges are (k + 1)-mers. Each read is represented by a walk in this graph. Any walk containing all the reads as subwalks represents a valid assembly. Consequently, the genome assembly problem is formulated as finding the shortest superwalk, which is closely related to the polynomial time Eulerian tour problem (which was previously used to solve the problem of sequencing by hybridization [3]). Despite that, the Shortest De Bruijn Superwalk problem (SDBS) was shown to be NP-hard [4]. Note also that SDBS has a special case solvable in polynomial time -- if each subwalk contains only one edge, this problem can be reduced to Chinese Postman Problem [5].

In [6] an algorithm for reads' copy counts estimation based on maximum likelihood principle was proposed. A similar algorithm can be applied to find multiplicities of edges in the de Bruijn graph, so, the following problem was formulated in the talk [7]. Given a de Bruijn graph with vertices of size k constructed from a set of reads and multiplicities (in unary notation) of all edges of this graph find a superwalk consistent with edge multiplicities and containing all reads as subwalks. This problem is named De Bruijn Superwalk with Multiplicities problem (DBSM) and its computational complexity remained unknown.

In this paper we prove that this problem is NP-hard.

NP-hardness proof

The proof has the following structure. First, the Common Superstring with Multiplicities (CSM) problem is formulated. This problem is shown to be NP-hard by reducing SCS to it. Then CSM is reduced to de Bruijn Superwalk with Multiplicities problem.

Let S be a string over alphabet . Let Lc (S) denote the number of occurrences of character c in S. Then, let Common Superstring with Multiplicities problem be the problem, given strings S1, S2, ..., Sn and nonnegative integers lc for all c (given in unary notation), to find out if there exists a string S such that:

- all strings S1, S2, ..., Sn are substrings of S,

- Lc (S) = lc for each c .

Theorem 1. Common Superstring with Multiplicities problem is NP-hard for || = 2.

Proof. To prove this, we take an instance of Shortest Common Superstring problem with = {0, 1}, which is NP-hard [8], and transform it into an instance of Common Superstring with Multiplicities problem with the same answer. Let the original instance of SCS problem be <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M1">View MathML</a>, l' (this instance means that we need to find if there exists a superstring of <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M2">View MathML</a> having length at most l').

Let us define T0 = 000111 and T1 = 001011. These strings have been selected in such a way that each of them contains the same number of zeroes and ones and they do not overlap -- no proper suffix of any of the Tc(c ∈ {0, 1}) is equal to any of the proper prefixes of any of the Tc(c ∈ {0, 1}).

Then, let <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M3">View MathML</a> and l0 = l1 = 3l', where <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M4">View MathML</a>. The following lemmas formulate several properties of these instances of SCS and CSM problems. Equivalence of these instances is shown in lemmas 3 and 7.

Lemma 1. L0(T(S')) = L1(T(S')) = 3|S'|.

Proof. It follows directly from the definition of T.

Lemma 2. If <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M5','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M5">View MathML</a>is a substring of <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M6">View MathML</a>, then <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M7">View MathML</a> is a substring of <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M8','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M8">View MathML</a>.

Proof. It follows directly from the definition of T.

Lemma 3. If the answer for the original instance of SCS problem is positive, then the answer for the instance of CSM problem is also positive.

Proof. If the answer for the instance of SCS problem is positive, then there exists a string S' of length l'' ≤ l' such that S' is a superstring of <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M9','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M9">View MathML</a>. Then, let S = T(S'0l'-l''). Because |S'0l'-l''| = |S'| + |0l'-l''| = l'' + (l'-l'') = l', L0(S) = L1(S) = 3l' (see lemma 1) and all Siare substrings of T(S') (see lemma 2) the answer to the instance of CSM is indeed positive.

Lemma 4. Let <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M10','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M10">View MathML</a>and <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M11','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M11">View MathML</a>be two strings such that there is a suffix of <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M12">View MathML</a> equal to a prefix of <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M13','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M13">View MathML</a>. Then the following holds:

- the length of that suffix is a multiple of 6,

- if the length of the suffix is 6k, then the suffix of length k of <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M14','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M14">View MathML</a> is equal to the prefix of length k of <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M15">View MathML</a>.

Proof. Suppose that the length of the suffix is equal to 6k + i, 0 < i <6. Let c1 be the last character of <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M16','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M16">View MathML</a> and c2 be the character at the (k + 1)-th position of <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M17','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M17">View MathML</a> (positions are numbered starting from one). Then, the suffix of <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M18','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M18">View MathML</a> of length i would be equal to the prefix of <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M19','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M19">View MathML</a> of the same length.

As mentioned before, no proper suffix of any of the Tc(c ∈ {0, 1}) is equal to any of the proper prefixes of any of the Tc(c ∈ {0, 1}). Therefore, the length of the suffix is a multiple of 6. The second follows from T0 and T1 both having length 6 and T0T1.

Lemma 5. Let <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M20">View MathML</a>and <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M21','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M21">View MathML</a> be two strings such that <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M22','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M22">View MathML</a> is a substring of <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M23','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M23">View MathML</a>.

Then following statements hold:

- each occurrence of <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M24','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M24">View MathML</a> in <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M25','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M25">View MathML</a> starts at a position which is congruent to 1 modulo 6,

- if <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M26','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M26">View MathML</a> occurs at position 6k + 1 in <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M27','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M27">View MathML</a>, then <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M28','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M28">View MathML</a> occurs as a substring of <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M29">View MathML</a> at position k + 1.

Proof. The proof is analogous to lemma 4.

Lemma 6. Let <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M30','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M30">View MathML</a> be a set of strings, and let S be a superstring of <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M31','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M31">View MathML</a>, <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M32','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M32">View MathML</a>such that <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M33','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M33">View MathML</a>, <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M34','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M34">View MathML</a>occur in S at positions i1, i2, ..., in respectively (if some <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M35','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M35">View MathML</a> occurs in S in multiple positions only one position is recorded) and every character of S is covered by at least one of those occurrences. Then the following statements hold:

- i1, i2, ..., in are all congruent to 1 modulo 6,

- length of S is a multiple of 6,

- There exists a string S' such that S = T(S'). Strings <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M36','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M36">View MathML</a> occur in S' at positions <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M37','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M37">View MathML</a>,where <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M38','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M38">View MathML</a> for k = 1, 2, ..., n.

Proof. Suppose the contrary. Let ikbe the smallest of i1, i2, ..., inwhich is not congruent to 1 modulo 6. Then, if ik-th character of S is covered by some <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M39','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M39">View MathML</a> such that ik' <ik, we have a contradiction because ik' is not congruent with ik modulo 6, but either <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M40','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M40">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M41','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M41">View MathML</a> overlap, or <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M42','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M42">View MathML</a> is a substring of <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M43','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M43">View MathML</a>, which would violate either lemma 4 or lemma 5. If ik-th character of S is not covered by any <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M44','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M44">View MathML</a>, such that, ik' <ik, than (ik- 1)-th character of S must be covered by the last character of some <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M45','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M45">View MathML</a>. But length of <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M46','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M46">View MathML</a> is a multiple of 6, so ikmust be congruent to ik' modulo 6, which leads to a contradiction.

The last character of S is also covered by the last character of some <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M47','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M47">View MathML</a>. Because ikis congruent to 1 modulo 6 and the length of <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M48','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M48">View MathML</a> is a multiple of 6, the length of S is also a multiple of 6.

To prove the last point, it is enough to notice that for j = 1, 7, ..., |S| - 5, the substring of S starting at position j and having length 6 is either T0 or T1. This follows from the fact that the j-th character of S is covered by an occurrence of <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M49','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M49">View MathML</a> for some k, but restrictions on lengths of <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M50','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M50">View MathML</a> and on ikmean that the whole substring of length 6 would be covered by <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M51','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M51">View MathML</a>. Moreover, the position at which the substring of length 6 occurs in <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M52','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M52">View MathML</a> is congruent to 1 modulo 6, therefore that substring is either T0 or T1 by definition of T .

Lemma 7. If the answer for the instance of CSM problem is positive, then the answer for the original instance of SCS problem is also positive.

Proof. If the answer for the instance of CSM problem is positive, then there exists a string S of length 6l' which is a superstring of S1, S2, ..., Sn. Let S'' be the shortest common superstring of these strings. Then |S''| ≤ 6l' and each character of S'' is covered by an occurrence of one of S1, S2, ...,Sn. Recall that <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M53','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M53">View MathML</a>. By lemma 6, there exists a string S' such that S'' = T(S') and <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M54','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M54">View MathML</a> are substrings of S'. Also the equation <a onClick="popup('http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M55','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/14/S5/S7/mathml/M55">View MathML</a> holds. Therefore, the answer for the original instance of SCS problem is also positive.

Theorem 2. The de Bruijn Superwalk with Multiplicities Problem is NP-hard for any fixed || ≥ 2 and any positive integer k.

Proof. Consider the graph with one vertex and two loops (see Figure 1). An instance of Common Superstring with Multiplicities problem with = {0, 1} can be translated into an instance of Superwalk with Multiplicities problem on this graph in the following way:

thumbnailFigure 1. A graph on which Common Superwalk with Multiplicities problem is NP-hard.

- each Sk is directly translated into a walk, by representing 0 as occurrence of edge 0 and 1 as occurrence of edge 1 in the walk,

- the multiplicity of edge 0 is set to l0, and the multiplicity of edge 1 is set to l1.

To complete the proof we need to embed this graph into a de Bruijn graph with given k.

This can be done in straightforward manner (see Figure 2). Edge 0 is mapped to a loop, while edge 1 is mapped to a cycle of length k + 1.

thumbnailFigure 2. Embedding of the graph from the figure 1 into a de Bruijn graph.

Conclusion

We have proved that the de Bruijn Superwalk with Multiplicities Problem is NP-hard. Results of this work combined with [4] show that all known models for genome assembly are NP-hard.

However, both de Bruijn Shortest Superwalk and de Bruijn Superwalk with Multiplicities problems have a special case (if subwalks consist of one edge) solvable in polynomial time. A reasonable direction for future research is to find if there exist other polynomially solvable special cases of these problems.

Authors' contributions

The work presented here was carried out in collaboration between all authors. All authors have contributed to, seen and approved the manuscript.

Acknowledgements

Research was supported by the Ministry of Education and Science of Russian Federation in the framework of the federal program "Scientific and scientific-pedagogical personnel of innovative Russia in 2009-2013" (contract 16.740.11.0495, agreement 14.B37.21.0562).

Declarations

Publication of this article was supported by the Ministry of Education and Science of Russian Federation in the framework of the federal program "Scientific and scientific-pedagogical personnel of innovative Russia in 2009-2013" and by the University ITMO.

This article has been published as part of BMC Bioinformatics Volume 14 Supplement 5, 2013: Proceedings of the Third Annual RECOMB Satellite Workshop on Massively Parallel Sequencing (RECOMB-seq 2013). The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/14/S5.

References

  1. Gallant J, Maier D, Storer J: On finding minimal length superstrings.

    J Comput Syst Sci 1980, 20(1):50-58. Publisher Full Text OpenURL

  2. Pevzner P, Tang H, Waterman M: An Eulerian path approach to DNA fragment assembly.

    Proceedings of the National Academy of Sciences 2001, 98:9748-9753. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  3. Pevzner P: 1-Tuple DNA sequencing: computer analysis.

    J Biomol Struct Dyn 1989, 7(1):63-73. PubMed Abstract | Publisher Full Text OpenURL

  4. Medvedev P, Georgiou K, Myers G, Brudno M: Computability of Models for Sequence Assembly, Algorithms in Bioinformatics, 7th International Workshop, WABI 2007, LNCS.

    4645:289-301.

  5. Edmonds J, Johnson E: Matching, Euler tours and the Chinese postman.

    Mathematical Programming 1973, 5:88-124. Publisher Full Text OpenURL

  6. Medvedev P, Brudno M: Maximum Likelihood Genome Assembly.

    Journal of Computational Biology 2009, 16(8):1101-1116. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  7. Medvedev P, Brudno M: De Bruijn Superwalk with Multiplicities Problem. In Talk at RECOMB Satellite Conference on Open Problems in Algorithmic Biology. St. Petersburg, Russia; 2012. OpenURL

  8. Garey M, Johnson D: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman; 1979.