Hierarchical clustering of secreted salivary protein sequences from Anopheles. Three clustering steps were performed sequentially at different similarity thresholds (≥ 90%, ≥ 70% and ≥ 40% identity), producing a hierarchical structure. The repartition of proteins from the Anopheles species into clusters of more than 2 protein sequences are proportionally represented by stacked bars and non-redundant (NR) protein sequences (i.e., sequences that were not clustered with other sequences over a specified similarity threshold) by pie charts. The cluster numbers indicated on the left side of the stacked bars correspond to protein clusters listed in Additional file 2. A total of 71, 5, 44, 30, 5 and 117 secreted salivary protein sequences were recovered from the NCBInr online database for An. gambiae, An. arabiensis, An. stephensi, An. funestus, An. albimanus and An. darlingi, respectively. The correspondence between the number of proteins in a cluster and length of stacked bars is indicated as well as the correspondence between the colours and each Anopheles species.
Fontaine et al. BMC Genomics 2012 13:614 doi:10.1186/1471-2164-13-614