Skip to main content

Biomedical Data Analyses Facilitated by Open Cheminformatics Workflows

Edited by Eva Nittinger, Alex Clark, Anna Gaulton, Barbara Zdrazil

Modern data science approaches aim to properly interconnect information in order to generate new knowledge and reveal hidden relationships in the data. The open data revolution has given users explicit rights, via open licenses, to download, curate, and reshare results, leading to a democratization of data. Research environments can make use of large publicly available data sets in the domain of life sciences. For example, the impact of the CC-BY-SA licensing of ChEMBL, funded by the Wellcome Trust, has boosted the compound-target interaction modelling [1].

Data sets - from small molecules to new modalities, such as peptides and oligonucleotides among others - require careful data curation (including data integration, annotation, filtering, and standardization) before they can be used for purposes such as data analysis, visualization, or predictive modeling. It is recommended to generate reusable workflows to avoid tedious repetitive data curation tasks in the future. Furthermore, making scripts for data curation and analyses freely available to the scientific community is a fundamental component for comparability of research and of reproducible Open Science, a core vision of the Journal of Cheminformatics.

Studies analyzing biomedical data have gained interest through the ever increasing amount and diversity of publicly available life science data sets and can provide valuable insights into quality and composition of data sets, biases and trends of the data etc.

For this Special Collection in J. Cheminform., contributions focus on - but are not limited to - cheminformatics workflows (such as Jupyter notebooks, RMarkdown, Common Workflow Language, Galaxy, KNIME workflows etc.) licensed with an OSI-approved [2] or Creative Commons license (CCZero, CC-BY, CC-BY-SA, but not ND and NC) [3], serving the curation and analysis of diverse life science data sets. The collection highlights the need for automation, transparency, and re-usability of cheminformatics workflows in drug discovery and related fields, lower the barrier for effective usage of reproducible workflows, and should lay a basis for community-wide standards in the domain of data curation and analysis. 


  1. Numerous ligand-based drug discovery projects are based on structure-activity relationship (SAR) analysis, such as Free-Wilson (FW) or matched molecular pair (MMP) analysis. Intrinsically they assume linearity...

    Authors: D. Gogishvili, E. Nittinger, C. Margreitter and C. Tyrchan

    Citation: Journal of Cheminformatics 2021 13:47

    Content type: Research article

    Published on: