Schematic diagram of each step of our heuristic method. (a) The process of enumerating the subsequences from the candidate-genes. The green sequence represents a candidate-gene, and the blue sequences represent the subsequences that were derived from this candidate-gene by the sliding scan. (b) Diagram of the relationship between the subsequences. The blue lines C1, C2,…, C8 represent the candidate-genes, and S1, S2,…, S11 are the subsequences that were enumerated from these candidate-genes. A solid line between two subsequences indicates that the two subsequences are neighbors, and the dotted line indicates that the two subsequences are far_neighbors. Each subsequence is marked as a candidate for a qualified sequence. (c) Diagram of the relationship between the marked subsequences after the far_neighbor examination. The subsequences S5, S9 and S10 are unmarked because they all contained a far_neighbor. However, S2 is still marked because one of its neighbors was located in C4, and its far_neighborS5 was also located in C4. In this situation, we are not concerned about whether the siRNA that was designed based on S2 will recognize S5 because C4 is already the target gene of this siRNA. Therefore, S2 is still marked as a candidate for the qualified sequence. (d) Diagram of the relationship between the marked subsequences after the powerful subsequence examination. S1, S4 and S6 are unmarked because they are not powerful subsequences and are all dominated by S2. (e) Diagram of the relationship between the marked subsequences after the excluded-gene hit examination. Ei is one of the excluded-genes, and the dot-dashed line indicates that the Hamming distance between a marked subsequence and a substring that is located in an excluded-gene is less than dN; this scenario also indicates that this marked subsequence contains an excluded-gene hit. Because any subsequences that contain an excluded-gene hit will be unmarked, S11 is unmarked.
Chang et al. BMC Genomics 2012 13:491 doi:10.1186/1471-2164-13-491