Open Access Research article

Classification of drug molecules considering their IC50 values using mixed-integer linear programming based hyper-boxes method

Pelin Armutlu1, Muhittin E Ozdemir2, Fadime Uney-Yuksektepe1, I Halil Kavakli23 and Metin Turkay12*

Author Affiliations

1 Department of Industrial Engineering, Koç University, Rumelifeneri Yolu, Sariyer, Istanbul 34450, Turkey

2 Center for Computational Biology and Bioinformatics, Koç University, Rumelifeneri Yolu, Sariyer, Istanbul 34450, Turkey

3 Department of Chemical and Biological Engineering, Koç University, Rumelifeneri Yolu, Sariyer, Istanbul 34450, Turkey

For all author emails, please log on.

BMC Bioinformatics 2008, 9:411  doi:10.1186/1471-2105-9-411

Published: 3 October 2008



A priori analysis of the activity of drugs on the target protein by computational approaches can be useful in narrowing down drug candidates for further experimental tests. Currently, there are a large number of computational methods that predict the activity of drugs on proteins. In this study, we approach the activity prediction problem as a classification problem and, we aim to improve the classification accuracy by introducing an algorithm that combines partial least squares regression with mixed-integer programming based hyper-boxes classification method, where drug molecules are classified as low active or high active regarding their binding activity (IC50 values) on target proteins. We also aim to determine the most significant molecular descriptors for the drug molecules.


We first apply our approach by analyzing the activities of widely known inhibitor datasets including Acetylcholinesterase (ACHE), Benzodiazepine Receptor (BZR), Dihydrofolate Reductase (DHFR), Cyclooxygenase-2 (COX-2) with known IC50 values. The results at this stage proved that our approach consistently gives better classification accuracies compared to 63 other reported classification methods such as SVM, Naïve Bayes, where we were able to predict the experimentally determined IC50 values with a worst case accuracy of 96%. To further test applicability of this approach we first created dataset for Cytochrome P450 C17 inhibitors and then predicted their activities with 100% accuracy.


Our results indicate that this approach can be utilized to predict the inhibitory effects of inhibitors based on their molecular descriptors. This approach will not only enhance drug discovery process, but also save time and resources committed.