Protein families versus protein domain families in protein function prediction. (a) Six evolutionarily related families of proteins with assigned domains. The proteins are coloured by function, where mixed colouring indicates multi-functionality. The domain dashing patterns indicate different superfamilies. A protein family resource would build one model per protein family (middle; coloured squares) and scan target proteins with these models, to assign them to families. (b) The domains from the proteins in (a) in their domain superfamilies, coloured by the function of the respective parent protein. Each superfamily is subdivided into functional families (dashed lines), based on the protocol described in the main text. Note that domains from functionally very similar proteins (red, orange, yellow) can go to the same family. The domain-based protein function prediction protocol first identifies domains in the target protein (bottom) and then scans each domain sequence with the functional family models available for its domain superfamily (middle). Each functional family is associated probabilistically with different whole-protein functions. Based on the family assignments of the individual domains, a combined function prediction for the whole protein is made. The best-scoring protein and domain family models are highlighted with a bold border in (a) and (b), respectively.
Rentzsch and Orengo BMC Bioinformatics 2013 14(Suppl 3):S5 doi:10.1186/1471-2105-14-S3-S5