Classifying pairs of consecutive HSPs. In this hypothetical example, a TBLASTN search using a 100 amino acid query protein produces two HSPs that are close together in the genome. We classify the relationship between two consecutive HSPs into one of four categories: (i) Frameshift. Two consecutive HSPs are in different frames but the distance between them is similar in both the query and the subject. (ii) Region of low similarity. Two consecutive HSPs are in the same frame, separated by a similar distance in both the query and the subject, with no stop codon between them. (iii) Intron. Two consecutive HSPs for which query and subject coordinates are dissimilar. This possibility is only considered if an existing gene from the same pillar and species group contains an intron. (iv) Duplication. If all other possibilities have been excluded, two consecutive HSPs suggest a probable local gene duplication.
Proux-Wéra et al. BMC Bioinformatics 2012 13:237 doi:10.1186/1471-2105-13-237