Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Open Badges Research article

Discrimination of approved drugs from experimental drugs by learning methods

Kailin Tang1, Ruixin Zhu2, Yixue Li13* and Zhiwei Cao12*

Author Affiliations

1 Shanghai Center for Bioinformation and Technology, 100 Qinzhou Road, Shanghai, 200235, China

2 College of Life science and Biotechnolog, Tongji University, 1239 Siping Road, Shanghai, 200092, China

3 Bioinformatics Center, Key Lab of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences; Graduate School of the Chinese Academy of Sciences, 320 YueYang Road, Shanghai 200031, China

For all author emails, please log on.

BMC Bioinformatics 2011, 12:157  doi:10.1186/1471-2105-12-157

Published: 14 May 2011



To assess whether a compound is druglike or not as early as possible is always critical in drug discovery process. There have been many efforts made to create sets of 'rules' or 'filters' which, it is hoped, will help chemists to identify 'drug-like' molecules from 'non-drug' molecules. However, among the chemical space of the druglike molecules, the minority will be approved drugs. Classifying approved drugs from experimental drugs may be more helpful to obtain future approved drugs. Therefore, discrimination of approved drugs from experimental ones has been done in this paper by analyzing the compounds in terms of existing drugs features and machine learning methods.


Four methodologies were compared by their performance to classify approved drugs from experimental ones. The best results were obtained by SVM, in which the accuracy is 0.7911, the sensitivity is 0.5929, and the specificity is 0.8743. Based on the results, consensus model was developed to effectively discriminate drugs, which further pushed the correct classification rate up to 0.8517, sensitivity up to 0.7242, specificity up to 0.9352. The applications on the Traditional Chinese Medicine Ingredients Database (TCM-ID) tested the methods. Therefore this model has been proven to be a potent tool for identifying drug molecules.


The studies would have potential applications in the research of combinatorial library design and virtual high throughput screening for drug discovery.