Functional discrimination of membrane proteins using machine learning techniques
Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), AIST Tokyo Waterfront Bio-IT Research Building, 2-42 Aomi, Koto-ku, Tokyo 135-0064, Japan
BMC Bioinformatics 2008, 9:135 doi:10.1186/1471-2105-9-135Published: 3 March 2008
Discriminating membrane proteins based on their functions is an important task in genome annotation. In this work, we have analyzed the characteristic features of amino acid residues in membrane proteins that perform major functions, such as channels/pores, electrochemical potential-driven transporters and primary active transporters.
We observed that the residues Asp, Asn and Tyr are dominant in channels/pores whereas the composition of hydrophobic residues, Phe, Gly, Ile, Leu and Val is high in electrochemical potential-driven transporters. The composition of all the amino acids in primary active transporters lies in between other two classes of proteins. We have utilized different machine learning algorithms, such as, Bayes rule, Logistic function, Neural network, Support vector machine, Decision tree etc. for discriminating these classes of proteins. We observed that most of the algorithms have discriminated them with similar accuracy. The neural network method discriminated the channels/pores, electrochemical potential-driven transporters and active transporters with the 5-fold cross validation accuracy of 64% in a data set of 1718 membrane proteins. The application of amino acid occurrence improved the overall accuracy to 68%. In addition, we have discriminated transporters from other α-helical and β-barrel membrane proteins with the accuracy of 85% using k-nearest neighbor method. The classification of transporters and all other proteins (globular and membrane) showed the accuracy of 82%.
The performance of discrimination with amino acid occurrence is better than that with amino acid composition. We suggest that this method could be effectively used to discriminate transporters from all other globular and membrane proteins, and classify them into channels/pores, electrochemical and active transporters.