Comparing sequence and semantic similarity. A BLAST sequence analysis was performed to calculate a sequence similarity score for each gene pair in the 100 k set for which sequence data was available. Of those gene pairs we considered only the 53,264 which obtained a score greater than zero. Pairs were binned by bit score and the average in each bin plotted against (A) TO scores and (B) Resnik-max scores. Thick horizontal lines indicate medians, boxes indicate interquartile ranges, and whiskers are drawn at 1.5 times the quartile, or the maximum (whichever is closer to the median).
Mistry and Pavlidis BMC Bioinformatics 2008 9:327 doi:10.1186/1471-2105-9-327