Figure S1. The number of positives against different thresholds. The number of Positive genes obtained for different thresholds t for all species. A threshold of t = 0.3 means members in a gene cluster differ by no more than roughly 30%, and the ’center’ gene (medoide) in each cluster is used as a Positive. If a species has sequences more than 400, then a sample of size 400 sequences are taken as positives. A small threshold (close to 0) gives fewer, but tighter, clusters.

