GapFiller extension phase (an example with L = 5, Δ = 4, δ = 2, m = 2, T1 = 0.3, T2 = 0.5). (a) The putative overlapping reads, selected by their fingerprint values, are checked for the presence of mismatches and possibly discarded. For each remaining read (say, r1, r2, r3, and r4), the number of mismatches (highlighted in red) with Si's suffix does not exceed δ = 2. (b) The consensus string is computed for every position j such that either j ≤ F (C) or at least m = 2 reads are available. The characters rounded in gray and red refer to low-represented and non-represented positions, respectively. In presence of ambiguities (i.e., positions in which more than one character with the same representation rate occur) GapFiller chooses the character belonging to the first read encountered, from left to right. (c) Reads with mismatches in correspondence of the low-represented positions are discarded (say, r1 and r2), hence they do not contribute to reach the threshold m to compute a new consensus string. In our example read r4's tail is cut in the non-represented position, regardless on whether it matches the consensus string or not. (d) The reads still alive after Step 3 are used to compute the final consensus string Cnew. Since there are 2 ≥ m available reads exceeding Si's tail, Cnew is computed, it is attached to Si, and the extended contig Si+1 is obtained.
Nadalin et al. BMC Bioinformatics 2012 13(Suppl 14):S8 doi:10.1186/1471-2105-13-S14-S8