Figure 4.

Protein similarity network clustering indicates possible family membership for uncharacterized proteins. (A) A distribution of edge weights (binned -log(BLAST E-values)) of the VOC superfamily is shown, with a cutoff value of 5.5 indicated by a red vertical line. The cutoff was determined by a heuristic described in [53] and was used for subsequent clustering. (B) MCL clusters for the VOC superfamily are displayed with nodes colored by family assignment. Red nodes represent proteins with unknown function. (See Additional File 6 for TransClust Clusters). (C) Four clusters within the MCL clustering results show only proteins from a single family or proteins of unknown function. (Three of these four clusters also appear in the TransClust results.) Based on this analysis, we hypothesize that the function of the unknowns is the same as that of the other proteins in each cluster. The protein highlighted in blue is BH2212, which was randomly selected for further analysis.

Morris et al. BMC Bioinformatics 2011 12:436   doi:10.1186/1471-2105-12-436
