Automated identification of protein-ligand interaction features using Inductive Logic Programming: a hexose binding case study
1 Computational Bioinformatics Laboratory, Department of Computer Science, Imperial College London, London, SW7 2BZ, UK
2 Department of Computer Sciences, Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI-53706, USA
3 Centre for Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
BMC Bioinformatics 2012, 13:162 doi:10.1186/1471-2105-13-162Published: 11 July 2012
There is a need for automated methods to learn general features of the interactions of a ligand class with its diverse set of protein receptors. An appropriate machine learning approach is Inductive Logic Programming (ILP), which automatically generates comprehensible rules in addition to prediction. The development of ILP systems which can learn rules of the complexity required for studies on protein structure remains a challenge. In this work we use a new ILP system, ProGolem, and demonstrate its performance on learning features of hexose-protein interactions.
The rules induced by ProGolem detect interactions mediated by aromatics and by planar-polar residues, in addition to less common features such as the aromatic sandwich. The rules also reveal a previously unreported dependency for residues CYS and LEU. They also specify interactions involving aromatic and hydrogen bonding residues. This paper shows that Inductive Logic Programming implemented in ProGolem can derive rules giving structural features of protein/ligand interactions. Several of these rules are consistent with descriptions in the literature.
In addition to confirming literature results, ProGolem’s model has a 10-fold cross-validated predictive accuracy that is superior, at the 95% confidence level, to another ILP system previously used to study protein/hexose interactions and is comparable with state-of-the-art statistical learners.