This article is part of the supplement: Proceedings of the Tenth Annual MCBIOS Conference
Drug activity prediction using multiple-instance learning via joint instance and feature selection
1 Department of Computer and Information Science, School of Engineering, University of Mississippi, University, 38677, USA
2 Department of Medicinal Chemistry, School of Pharmacy, University of Mississippi, University, 38677, USA
3 Research Institute of Pharmaceutical Sciences, School of Pharmacy, University of Mississippi, University, 38677, USA
BMC Bioinformatics 2013, 14(Suppl 14):S16 doi:10.1186/1471-2105-14-S14-S16Published: 9 October 2013
In drug discovery and development, it is crucial to determine which conformers (instances) of a given molecule are responsible for its observed biological activity and at the same time to recognize the most representative subset of features (molecular descriptors). Due to experimental difficulty in obtaining the bioactive conformers, computational approaches such as machine learning techniques are much needed. Multiple Instance Learning (MIL) is a machine learning method capable of tackling this type of problem. In the MIL framework, each instance is represented as a feature vector, which usually resides in a high-dimensional feature space. The high dimensionality may provide significant information for learning tasks, but at the same time it may also include a large number of irrelevant or redundant features that might negatively affect learning performance. Reducing the dimensionality of data will hence facilitate the classification task and improve the interpretability of the model.
In this work we propose a novel approach, named multiple instance learning via joint instance and feature selection. The iterative joint instance and feature selection is achieved using an instance-based feature mapping and 1-norm regularized optimization. The proposed approach was tested on four biological activity datasets.
The empirical results demonstrate that the selected instances (prototype conformers) and features (pharmacophore fingerprints) have competitive discriminative power and the convergence of the selection process is also fast.