The sequence and cluster term sets involved in assessing cluster functional coherence. The sequences in a given domain sequence cluster are associated with different GO term sets via their parent proteins. Each sequence term set can be split into MF, BP and CC term subsets. If any of the sequence MF term sets is not empty, a set of core MF terms for the cluster is compiled from all sequence MF term sets, and, via further, intermediate steps, a set of terms to be ignored in cluster assessment is compiled. If none of the sequence term sets contains MF terms, the filter term set is an empty set. In the next step, the initial cluster term set is prepared as the union of all representative sequence term sets. From this set, any terms that are also found in the filter term set are removed (filtered), yielding the final cluster term set. Like the sequence term sets, the cluster term set can be split into MF, BP and CC subsets. The term-type specific sets of the representative sequences are compared with those of the cluster as a whole, respectively, to assess the functional coherence of the sequences in the cluster. Key term sets in the described process are highlighted in bold.
Rentzsch and Orengo BMC Bioinformatics 2013 14(Suppl 3):S5 doi:10.1186/1471-2105-14-S3-S5