Comparison of the GeMMA and FunFam family identification protocols. (a) The partial GeMMA clustering dendrogram of a sequence superfamily. Different colours correspond to the protein functions associated with the clusters; grey indicates a lack of annotations. (b) The corresponding part of the dendrogram when using the FunFam protocol. Note that unannotated starting clusters (grey) are here removed prior to clustering. The COMPASS  E-values at the bottom of both subfigures reflect the maximum sequence profile similarity observed between any two clusters at a given point, which decreases in the course of clustering . The number of clusters (shown in this part of the dendrogram) that still exist when stopping the clustering at a given granularity level is stated at the top. Arrows indicate which clusters are eventually selected to represent functional families.
Rentzsch and Orengo BMC Bioinformatics 2013 14(Suppl 3):S5 doi:10.1186/1471-2105-14-S3-S5