Single linkage clustering with alignment coverage constraints. The four proteins (A, B, C, D) contain some homologous domains (represented by colored boxes). To avoid the clustering in the same family of proteins that do not share any homology (e.g. A and D), pairwise sequence alignments are considered for the clustering only if they cover a minimum threshold of the length of each of the two proteins. This threshold has to be high enough to exclude cases like the alignment (B, C), which would lead to the clustering of A and D.
Miele et al. BMC Bioinformatics 2011 12:116 doi:10.1186/1471-2105-12-116