Figure 1.

Index file A and data file L for sequence s1: CAATTACGAGCTCTGCCTACAATGAT. The format for and are discussed in the text. To demonstrate how different regions map to different genes, the first 13 bases map to the gene with PID = 1234 and the last 13 bases map to the gene with PID = 5678. We add leading zeroes to each location so that all numbers in are four bytes and we record this as numbersize in each line in . Keys in this example are made from two bases of sequence so there are 42 = 16 lines in ranging from m(AA) = 5 through m(CC) = 20. Key number m(GT) = 11 and number m(GG) = 15 are not present in the sequence. For clarity, each offset in is repeated in the correct position above the line in and each PID is underlined. Two arrows map two different lines from into by pointing to two bubbles that show the content of two hash bins.

Reneker and Shyu BMC Bioinformatics 2005 6:111   doi:10.1186/1471-2105-6-111
Download authors' original image