Prediction and classification of aminoacyl tRNA synthetases using PROSITE domains
Bioinformatics Centre, Institute of Microbial Technology, Chandigarh, India
BMC Genomics 2010, 11:507 doi:10.1186/1471-2164-11-507Published: 22 September 2010
Aminoacyl tRNA synthetases (aaRSs) catalyse the first step of protein synthesis in all organisms. They are responsible for the precise attachment of amino acids to their cognate transfer RNAs. There are twenty different types of aaRSs, unique for each amino acid. These aaRSs have been divided into two classes, each comprising ten enzymes. It is important to predict and classify aaRSs in order to understand protein synthesis.
In this study, all models were developed on a non-redundant dataset containing 117 aaRSs and an equal number of non-aaRSs, in which no two sequences have more than 30% similarity. First, we applied the similarity search technique, BLAST, and achieved a maximum accuracy of 67.52%. We observed that 62% of tRNA synthetases contain one or more domains from amongst the following four PROSITE domains: PS50862, PS00178, PS50860 and PS50861. An SVM-based model was developed to discriminate between aaRSs, and non-aaRSs, and achieved a maximum MCC of 0.68 with accuracy of 83.73%, using selective dipeptide composition. We developed a hybrid approach and achieved a maximum MCC of 0.72 with accuracy of 85.49%, where SVM model developed using selected dipeptide composition and information of four PROSITE domains. We further developed an SVM-based model for classifying the aaRSs into class-1 and class-2, using selective dipeptide composition and achieved an MCC of 0.79. We also observed that two domains (PS00178, PS50889) in class-1 and three domains (PS50862, PS50860, PS50861) in class-2 were preferred. A hybrid method was developed using these domains as descriptor, along with selected dipeptide composition, and achieved an MCC of 0.87 with a sensitivity of 94.55% and an accuracy of 93.19%. All models were evaluated using a five-fold cross-validation technique.
We have analyzed protein sequences of aaRSs (class-1 and class-2) and non-aaRSs and identified interesting patterns. The high accuracy achieved by our SVM models using selected dipeptide composition demonstrates that certain types of dipeptide are preferred in aaRSs. We were able to identify PROSITE domains that are preferred in aaRSs and their classes, providing interesting insights into tRNA synthetases. The method developed in this study will be useful for researchers studying aaRS enzymes and tRNA biology. The web-server based on the above study, is available at http://www.imtech.res.in/raghava/icaars/ webcite.