A comparative study of conservation and variation scores
1 Division of Bioinformatics, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
2 CBRC, AIST Tokyo Waterfront Bio-IT Research Building, 2-42 Aomi, Koto-ku, Tokyo 135-0064, Japan
BMC Bioinformatics 2010, 11:388 doi:10.1186/1471-2105-11-388Published: 21 July 2010
Conservation and variation scores are used when evaluating sites in a multiple sequence alignment, in order to identify residues critical for structure or function. A variety of scores are available today but it is not clear how different scores relate to each other.
We applied 25 conservation and variation scores to alignments from the Catalytic Site Atlas (CSA). We calculated distances among scores based on correlation coefficients, and constructed a dendrogram of the scores by average linking cluster analysis. The cluster analysis showed that most scores fall into one of two groups--substitution matrix based group and frequency based group respectively. We also evaluated the scores' performance in predicting catalytic sites and found that frequency based scores generally perform best.
Conservation and variation scores can be classified into mainly two large groups. When using a score to predict catalytic sites, frequency based scores that also consider a background distribution are most successful.