Figure 1.

Algorithm overview. Overview of the algorithm steps with reads of length 7, a minimal coverage of 2 and k-mers of length k=3. a) Representation of the sub-starter generation step. A set of reads is mapped to the starter s. First, reads are error-corrected according to a voting procedure (see lower right read for instance). Then, each sub-starter (s1 and s2) is computed from each perfect multiple read alignment. The Hamming distance between each sub-starter and s is required to be below a certain threshold. b) Representation of an extension. Three reads have prefix of length at least k mapping perfectly to the suffix of an extension s. All fragments of these reads longer than extension s are used for generating extension of s. As minimal coverage is 2, the last character of the first extending reads (T) is not stored for generating extension of s. The generated extension of s (ACT) is stored in a new node linked to extension s. Note that suffix of length k−1 of extension s (TC) is stored as prefix of extension of s (then called enriched extension). This avoids to omit overlapping k-mers between extensions such as TCA or CAC while mapping reads on extension of s.

Peterlongo and Chikhi BMC Bioinformatics 2012 13:48   doi:10.1186/1471-2105-13-48
Download authors' original image