This article is part of the supplement: Eleventh International Conference on Bioinformatics (InCoB2012): Computational Biology
The parasite specific substitution matrices improve the annotation of apicomplexan proteins
Laboratory of Computational and Functional Genomics, Centre for DNA Fingerprinting and Diagnostics (CDFD), A Sun Centre of Excellence in Medical Bioinformatics, Tuljaiguda, Nampally, Hyderabad 500001, India
BMC Genomics 2012, 13(Suppl 7):S19 doi:10.1186/1471-2164-13-S7-S19Published: 13 December 2012
A number of apicomplexan genomes have been sequenced successfully in recent years and this would help in understanding the biology of apicomplexan parasites. The members of the phylum Apicomplexa are important protozoan parasites (Plasmodium, Toxoplasma and Cryptosporidium etc) that cause some of the deadly diseases in humans and animals. In our earlier studies, we have shown that the standard BLOSUM matrices are not suitable for compositionally biased apicomplexan proteins. So we developed a novel series (SMAT and PfFSmat60) of substitution matrices which performed better in comparison to standard BLOSUM matrices and developed ApicoAlign, a sequence search and alignment tool for apicomplexan proteins. In this study, we demonstrate the higher specificity of these matrices and make an attempt to improve the annotation of apicomplexan kinases and proteases.
The ROC curves proved that SMAT80 performs best for apicomplexan proteins followed by compositionally adjusted BLOSUM62 (PSI-BLAST searches), BLOSUM90 and BLOSUM62 matrices in terms of detecting true positives. The poor E-values and/or bit scores given by SMAT80 matrix for the experimentally identified coccidia-specific oocyst wall proteins against hematozoan (non-coccidian) parasites further supported the higher specificity of the same. SMAT80 uniquely detected (missed by BLOSUM) orthologs for 1374 apicomplexan hypothetical proteins against SwissProt database and predicted 70 kinases and 17 proteases. Further analysis confirmed the conservation of functional residues of kinase domain in one of the SMAT80 detected kinases. Similarly, one of the SMAT80 detected proteases was predicted to be a rhomboid protease.
The parasite specific substitution matrices have higher specificity for apicomplexan proteins and are helpful in detecting the orthologs missed by BLOSUM matrices and thereby improve the annotation of apicomplexan proteins which are hypothetical or with unknown function.