Var diversity within a local population is captured by homology blocks. (A) Frequency of each HB in the dataset of genomic var tags. (B-C) The pairwise similarity among sequence types, where types are defined by homology block composition: the number of HBs shared between any two sequences divided by the average number of HBs within a sequence for those two sequences. (B) Frequency distribution of pairwise HB similarities between sequences in the genomic dataset. The approximately normal distribution contrasts with the bimodal distribution that has been observed for other data, when pairwise similarity is defined by amino acid identity . (C) Sequences are hierarchically ordered based on pairwise HB similarity using the average-linkage method as implemented in SciPy. The distinction between sequence tags containing two cysteines (cys2) versus four (cys4) is very clear, reflecting that recombination occurs at a faster rate within, relative to between, the two groups.
Rorick et al. BMC Microbiology 2013 13:244 doi:10.1186/1471-2180-13-244