MFSPSSMpred: identifying short disorder-to-order binding regions in disordered proteins based on contextual local evolutionary conservation
1 Department of Computer Science and Engineering, Waseda University, Tokyo, Japan
2 Computational Biology Research Center (CBRC), Tokyo, Japan
3 Meiji Pharmaceutical University, Tokyo, Japan
BMC Bioinformatics 2013, 14:300 doi:10.1186/1471-2105-14-300Published: 4 October 2013
Molecular recognition features (MoRFs) are short binding regions located in longer intrinsically disordered protein regions. Although these short regions lack a stable structure in the natural state, they readily undergo disorder-to-order transitions upon binding to their partner molecules. MoRFs play critical roles in the molecular interaction network of a cell, and are associated with many human genetic diseases. Therefore, identification of MoRFs is an important step in understanding functional aspects of these proteins and in finding applications in drug design.
Here, we propose a novel method for identifying MoRFs, named as MFSPSSMpred (Masked, Filtered and Smoothed Position-Specific Scoring Matrix-based Predictor). Firstly, a masking method is used to calculate the average local conservation scores of residues within a masking-window length in the position-specific scoring matrix (PSSM). Then, the scores below the average are filtered out. Finally, a smoothing method is used to incorporate the features of flanking regions for each residue to prepare the feature sets for prediction. Our method employs no predicted results from other classifiers as input, i.e., all features used in this method are extracted from the PSSM of sequence only. Experimental results show that, comparing with other methods tested on the same datasets, our method achieves the best performance: achieving 0.004~0.079 higher AUC than other methods when tested on TEST419, and achieving 0.045~0.212 higher AUC than other methods when tested on TEST2012. In addition, when tested on an independent membrane proteins-related dataset, MFSPSSMpred significantly outperformed the existing predictor MoRFpred.
This study suggests that: 1) amino acid composition and physicochemical properties in the flanking regions of MoRFs are very different from those in the general non-MoRF regions; 2) MoRFs contain both highly conserved residues and highly variable residues and, on the whole, are highly locally conserved; and 3) combining contextual information with local conservation information of residues facilitates the prediction of MoRFs.