Open Access Open Badges Research article

Automated identification of protein-ligand interaction features using Inductive Logic Programming: a hexose binding case study

Jose C A Santos1*, Houssam Nassif2, David Page2, Stephen H Muggleton1 and Michael J E Sternberg3

Author Affiliations

1 Computational Bioinformatics Laboratory, Department of Computer Science, Imperial College London, London, SW7 2BZ, UK

2 Department of Computer Sciences, Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI-53706, USA

3 Centre for Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK

For all author emails, please log on.

BMC Bioinformatics 2012, 13:162  doi:10.1186/1471-2105-13-162

Published: 11 July 2012



There is a need for automated methods to learn general features of the interactions of a ligand class with its diverse set of protein receptors. An appropriate machine learning approach is Inductive Logic Programming (ILP), which automatically generates comprehensible rules in addition to prediction. The development of ILP systems which can learn rules of the complexity required for studies on protein structure remains a challenge. In this work we use a new ILP system, ProGolem, and demonstrate its performance on learning features of hexose-protein interactions.


The rules induced by ProGolem detect interactions mediated by aromatics and by planar-polar residues, in addition to less common features such as the aromatic sandwich. The rules also reveal a previously unreported dependency for residues CYS and LEU. They also specify interactions involving aromatic and hydrogen bonding residues. This paper shows that Inductive Logic Programming implemented in ProGolem can derive rules giving structural features of protein/ligand interactions. Several of these rules are consistent with descriptions in the literature.


In addition to confirming literature results, ProGolem’s model has a 10-fold cross-validated predictive accuracy that is superior, at the 95% confidence level, to another ILP system previously used to study protein/hexose interactions and is comparable with state-of-the-art statistical learners.