Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

Predictive modeling of anti-malarial molecules inhibiting apicoplast formation

Salma Jamal1, Vinita Periwal2, Open Source Drug Discovery Consortium1 and Vinod Scaria2*

Author Affiliations

1 CSIR Open Source Drug Discovery Unit, Anusandhan Bhavan, Delhi 110001, India

2 GN Ramachandran Knowledge Centre for Genome Informatics, CSIR Institute of Genomics and Integrative Biology, Mall Road, Delhi 110007, India

For all author emails, please log on.

BMC Bioinformatics 2013, 14:55  doi:10.1186/1471-2105-14-55

Published: 15 February 2013



Malaria is a major healthcare problem worldwide resulting in an estimated 0.65 million deaths every year. It is caused by the members of the parasite genus Plasmodium. The current therapeutic options for malaria are limited to a few classes of molecules, and are fast shrinking due to the emergence of widespread resistance to drugs in the pathogen. The recent availability of high-throughput phenotypic screen datasets for antimalarial activity offers a possibility to create computational models for bioactivity based on chemical descriptors of molecules with potential to accelerate drug discovery for malaria.


In the present study, we have used high-throughput screen datasets for the discovery of apicoplast inhibitors of the malarial pathogen as assayed from the delayed death response. We employed machine learning approach and developed computational predictive models to predict the biological activity of new antimalarial compounds. The molecules were further evaluated for common substructures using a Maximum Common Substructure (MCS) based approach.


We created computational models using state-of-the-art machine learning algorithms. The models were evaluated based on multiple statistical criteria. We found Random Forest based approach provides for better accuracy as assessed from ROC curve analysis. We further evaluated the active molecules using a substructure based approach to identify common substructures enriched in the active set. We argue that the computational models generated could be effectively used to screen large molecular datasets to prioritize them for phenotypic screens, drastically reducing cost while improving the hit rate.