This article is part of the supplement: Symposium of Computations in Bioinformatics and Bioscience (SCBB07)
Asymmetric bagging and feature selection for activities prediction of drug molecules
1 Institute of Systems Biology, Shanghai University, Shanghai 200444, China
2 School of Computer Engineering & Science, Shanghai University, Shanghai 200072, China
3 Department of Chemistry, School of Science, Shanghai University, Shanghai 200444, China
4 Harvard Medical School, Harvard University, Cambridge, Massachusetts 02140-0888 USA
5 National Human Genome Research Institute National Institutes of Health (NIH) U.S., Department of Health and Human Services Bethesda, MD 20852 USA
Citation and License
BMC Bioinformatics 2008, 9(Suppl 6):S7 doi:10.1186/1471-2105-9-S6-S7Published: 28 May 2008
Activities of drug molecules can be predicted by QSAR (quantitative structure activity relationship) models, which overcomes the disadvantages of high cost and long cycle by employing the traditional experimental method. With the fact that the number of drug molecules with positive activity is rather fewer than that of negatives, it is important to predict molecular activities considering such an unbalanced situation.
Here, asymmetric bagging and feature selection are introduced into the problem and asymmetric bagging of support vector machines (asBagging) is proposed on predicting drug activities to treat the unbalanced problem. At the same time, the features extracted from the structures of drug molecules affect prediction accuracy of QSAR models. Therefore, a novel algorithm named PRIFEAB is proposed, which applies an embedded feature selection method to remove redundant and irrelevant features for asBagging. Numerical experimental results on a data set of molecular activities show that asBagging improve the AUC and sensitivity values of molecular activities and PRIFEAB with feature selection further helps to improve the prediction ability.
Asymmetric bagging can help to improve prediction accuracy of activities of drug molecules, which can be furthermore improved by performing feature selection to select relevant features from the drug molecules data sets.