Skip to main content

Biomedical Data Analyses Facilitated by Open Cheminformatics Workflows

Edited by Eva Nittinger, Alex Clark, Anna Gaulton, Barbara Zdrazil

Modern data science approaches aim to properly interconnect information in order to generate new knowledge and reveal hidden relationships in the data. The open data revolution has given users explicit rights, via open licenses, to download, curate, and reshare results, leading to a democratization of data. Research environments can make use of large publicly available data sets in the domain of life sciences. For example, the impact of the CC-BY-SA licensing of ChEMBL, funded by the Wellcome Trust, has boosted the compound-target interaction modelling [1].

Data sets - from small molecules to new modalities, such as peptides and oligonucleotides among others - require careful data curation (including data integration, annotation, filtering, and standardization) before they can be used for purposes such as data analysis, visualization, or predictive modeling. It is recommended to generate reusable workflows to avoid tedious repetitive data curation tasks in the future. Furthermore, making scripts for data curation and analyses freely available to the scientific community is a fundamental component for comparability of research and of reproducible Open Science, a core vision of the Journal of Cheminformatics.

Studies analyzing biomedical data have gained interest through the ever increasing amount and diversity of publicly available life science data sets and can provide valuable insights into quality and composition of data sets, biases and trends of the data etc.

For this Special Collection in J. Cheminform., contributions focus on - but are not limited to - cheminformatics workflows (such as Jupyter notebooks, RMarkdown, Common Workflow Language, Galaxy, KNIME workflows etc.) licensed with an OSI-approved [2] or Creative Commons license (CCZero, CC-BY, CC-BY-SA, but not ND and NC) [3], serving the curation and analysis of diverse life science data sets. The collection highlights the need for automation, transparency, and re-usability of cheminformatics workflows in drug discovery and related fields, lower the barrier for effective usage of reproducible workflows, and should lay a basis for community-wide standards in the domain of data curation and analysis. 

  1. https://wellcome.org/press-release/open-access-drug-discovery-database-launches-half-million-compounds
  2. https://opensource.org/about
  3. https://creativecommons.org/


  1. Machine learning (ML) models require an extensive, user-driven selection of molecular descriptors in order to learn from chemical structures to predict actives and inactives with a high reliability. In additio...

    Authors: Aljoša Smajić, Melanie Grandits and Gerhard F. Ecker
    Citation: Journal of Cheminformatics 2022 14:54
  2. As an alternative to one drug-one target approaches, systems biology methods can provide a deeper insight into the holistic effects of drugs. Network-based approaches are tools of systems biology, that can rep...

    Authors: Barbara Füzi, Rahuman S. Malik-Sheriff, Emma J. Manners, Henning Hermjakob and Gerhard F. Ecker
    Citation: Journal of Cheminformatics 2022 14:37
  3. Unpredicted drug safety issues constitute the majority of failures in the pharmaceutical industry according to several studies. Some of these preclinical safety issues could be attributed to the non-selective ...

    Authors: Doha Naga, Wolfgang Muster, Eunice Musvasva and Gerhard F. Ecker
    Citation: Journal of Cheminformatics 2022 14:27
  4. We present several workflows for protein-ligand docking and free energy calculation for use in the workflow management system Galaxy. The workflows are composed of several widely used open-source tools, includ...

    Authors: Simon Bray, Tim Dudgeon, Rachael Skyner, Rolf Backofen, Björn Grüning and Frank von Delft
    Citation: Journal of Cheminformatics 2022 14:22
  5. The thermal shift assay (TSA)—also known as differential scanning fluorimetry (DSF), thermofluor, and Tm shift—is one of the most popular biophysical screening techniques used in fragment-based ligand discovery (...

    Authors: Errol L. G. Samuel, Secondra L. Holmes and Damian W. Young
    Citation: Journal of Cheminformatics 2021 13:99
  6. In the process of drug discovery, the optimization of lead compounds has always been a challenge faced by pharmaceutical chemists. Matched molecular pair analysis (MMPA), a promising tool to efficiently extrac...

    Authors: Zi-Yi Yang, Li Fu, Ai-Ping Lu, Shao Liu, Ting-Jun Hou and Dong-Sheng Cao
    Citation: Journal of Cheminformatics 2021 13:86
  7. Numerous ligand-based drug discovery projects are based on structure-activity relationship (SAR) analysis, such as Free-Wilson (FW) or matched molecular pair (MMP) analysis. Intrinsically they assume linearity...

    Authors: D. Gogishvili, E. Nittinger, C. Margreitter and C. Tyrchan
    Citation: Journal of Cheminformatics 2021 13:47