Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Highly Accessed Research article

Prediction and classification of aminoacyl tRNA synthetases using PROSITE domains

Bharat Panwar and Gajendra PS Raghava*

Author Affiliations

Bioinformatics Centre, Institute of Microbial Technology, Chandigarh, India

For all author emails, please log on.

BMC Genomics 2010, 11:507  doi:10.1186/1471-2164-11-507

Published: 22 September 2010

Abstract

Background

Aminoacyl tRNA synthetases (aaRSs) catalyse the first step of protein synthesis in all organisms. They are responsible for the precise attachment of amino acids to their cognate transfer RNAs. There are twenty different types of aaRSs, unique for each amino acid. These aaRSs have been divided into two classes, each comprising ten enzymes. It is important to predict and classify aaRSs in order to understand protein synthesis.

Results

In this study, all models were developed on a non-redundant dataset containing 117 aaRSs and an equal number of non-aaRSs, in which no two sequences have more than 30% similarity. First, we applied the similarity search technique, BLAST, and achieved a maximum accuracy of 67.52%. We observed that 62% of tRNA synthetases contain one or more domains from amongst the following four PROSITE domains: PS50862, PS00178, PS50860 and PS50861. An SVM-based model was developed to discriminate between aaRSs, and non-aaRSs, and achieved a maximum MCC of 0.68 with accuracy of 83.73%, using selective dipeptide composition. We developed a hybrid approach and achieved a maximum MCC of 0.72 with accuracy of 85.49%, where SVM model developed using selected dipeptide composition and information of four PROSITE domains. We further developed an SVM-based model for classifying the aaRSs into class-1 and class-2, using selective dipeptide composition and achieved an MCC of 0.79. We also observed that two domains (PS00178, PS50889) in class-1 and three domains (PS50862, PS50860, PS50861) in class-2 were preferred. A hybrid method was developed using these domains as descriptor, along with selected dipeptide composition, and achieved an MCC of 0.87 with a sensitivity of 94.55% and an accuracy of 93.19%. All models were evaluated using a five-fold cross-validation technique.

Conclusions

We have analyzed protein sequences of aaRSs (class-1 and class-2) and non-aaRSs and identified interesting patterns. The high accuracy achieved by our SVM models using selected dipeptide composition demonstrates that certain types of dipeptide are preferred in aaRSs. We were able to identify PROSITE domains that are preferred in aaRSs and their classes, providing interesting insights into tRNA synthetases. The method developed in this study will be useful for researchers studying aaRS enzymes and tRNA biology. The web-server based on the above study, is available at http://www.imtech.res.in/raghava/icaars/ webcite.