Open Access Open Badges Research article

Comprehensive data-driven analysis of the impact of chemoinformatic structure on the genome-wide biological response profiles of cancer cells to 1159 drugs

Suleiman A Khan1*, Ali Faisal1, John Patrick Mpindi2, Juuso A Parkkinen1, Tuomo Kalliokoski3, Antti Poso4, Olli P Kallioniemi2, Krister Wennerberg2 and Samuel Kaski15*

Author affiliations

1 Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, PO Box 15400, Espoo, 00076, Finland

2 Institute for Molecular Medicine Finland FIMM, University of Helsinki, PO Box 20, Helsinki, 00014, Finland

3 CADD, Global Discovery Chemistry, Novartis Institute for Biomedical Research, Basel, CH4002, Switzerland

4 School of Pharmacy, Faculty of Health Sciences, University of Eastern Finland, PO Box 1627, Kuopio, 70211, Finland

5 Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, PO Box 68, Helsinki, 00014, Finland

For all author emails, please log on.

Citation and License

BMC Bioinformatics 2012, 13:112  doi:10.1186/1471-2105-13-112

Published: 30 May 2012



Detailed and systematic understanding of the biological effects of millions of available compounds on living cells is a significant challenge. As most compounds impact multiple targets and pathways, traditional methods for analyzing structure-function relationships are not comprehensive enough. Therefore more advanced integrative models are needed for predicting biological effects elicited by specific chemical features. As a step towards creating such computational links we developed a data-driven chemical systems biology approach to comprehensively study the relationship of 76 structural 3D-descriptors (VolSurf, chemical space) of 1159 drugs with the microarray gene expression responses (biological space) they elicited in three cancer cell lines. The analysis covering 11350 genes was based on data from the Connectivity Map. We decomposed the biological response profiles into components, each linked to a characteristic chemical descriptor profile.


Integrated analysis of both the chemical and biological space was more informative than either dataset alone in predicting drug similarity as measured by shared protein targets. We identified ten major components that link distinct VolSurf chemical features across multiple compounds to specific cellular responses. For example, component 2 (hydrophobic properties) strongly linked to DNA damage response, while component 3 (hydrogen bonding) was associated with metabolic stress. Individual structural and biological features were often linked to one cell line only, such as leukemia cells (HL-60) specifically responding to cardiac glycosides.


In summary, our approach identified several novel links between specific chemical structure properties and distinct biological responses in cells incubated with these drugs. Importantly, the analysis focused on chemical-biological properties that emerge across multiple drugs. The decoding of such systematic relationships is necessary to build better models of drug effects, including unanticipated types of molecular properties having strong biological effects.