Biomedical Data Analyses Facilitated by Open Cheminformatics Workflows

Edited by Eva Nittinger, Alex Clark, Anna Gaulton, Barbara Zdrazil

Modern data science approaches aim to properly interconnect information in order to generate new knowledge and reveal hidden relationships in the data. The open data revolution has given users explicit rights, via open licenses, to download, curate, and reshare results, leading to a democratization of data. Research environments can make use of large publicly available data sets in the domain of life sciences. For example, the impact of the CC-BY-SA licensing of ChEMBL, funded by the Wellcome Trust, has boosted the compound-target interaction modelling [1].

Data sets - from small molecules to new modalities, such as peptides and oligonucleotides among others - require careful data curation (including data integration, annotation, filtering, and standardization) before they can be used for purposes such as data analysis, visualization, or predictive modeling. It is recommended to generate reusable workflows to avoid tedious repetitive data curation tasks in the future. Furthermore, making scripts for data curation and analyses freely available to the scientific community is a fundamental component for comparability of research and of reproducible Open Science, a core vision of the Journal of Cheminformatics.

Studies analyzing biomedical data have gained interest through the ever increasing amount and diversity of publicly available life science data sets and can provide valuable insights into quality and composition of data sets, biases and trends of the data etc.

For this Special Collection in J. Cheminform., contributions focus on - but are not limited to - cheminformatics workflows (such as Jupyter notebooks, RMarkdown, Common Workflow Language, Galaxy, KNIME workflows etc.) licensed with an OSI-approved [2] or Creative Commons license (CCZero, CC-BY, CC-BY-SA, but not ND and NC) [3], serving the curation and analysis of diverse life science data sets. The collection highlights the need for automation, transparency, and re-usability of cheminformatics workflows in drug discovery and related fields, lower the barrier for effective usage of reproducible workflows, and should lay a basis for community-wide standards in the domain of data curation and analysis.

Biomedical data analyses facilitated by open cheminformatics workflows

Authors: Eva Nittinger, Alex Clark, Anna Gaulton and Barbara Zdrazil

Citation: Journal of Cheminformatics 2023 15:46

Content type: Editorial Published on: 17 April 2023
- View Full Text
- View PDF
Using Jupyter Notebooks for re-training machine learning models

Machine learning (ML) models require an extensive, user-driven selection of molecular descriptors in order to learn from chemical structures to predict actives and inactives with a high reliability. In additio...

Authors: Aljoša Smajić, Melanie Grandits and Gerhard F. Ecker

Citation: Journal of Cheminformatics 2022 14:54

Content type: Educational Published on: 13 August 2022
- View Full Text
- View PDF
KNIME workflow for retrieving causal drug and protein interactions, building networks, and performing topological enrichment analysis demonstrated by a DILI case study

As an alternative to one drug-one target approaches, systems biology methods can provide a deeper insight into the holistic effects of drugs. Network-based approaches are tools of systems biology, that can rep...

Authors: Barbara Füzi, Rahuman S. Malik-Sheriff, Emma J. Manners, Henning Hermjakob and Gerhard F. Ecker

Citation: Journal of Cheminformatics 2022 14:37

Content type: Research article Published on: 13 June 2022
- View Full Text
- View PDF
Off-targetP ML: an open source machine learning framework for off-target panel safety assessment of small molecules

Unpredicted drug safety issues constitute the majority of failures in the pharmaceutical industry according to several studies. Some of these preclinical safety issues could be attributed to the non-selective ...

Authors: Doha Naga, Wolfgang Muster, Eunice Musvasva and Gerhard F. Ecker

Citation: Journal of Cheminformatics 2022 14:27

Content type: Research article Published on: 7 May 2022
- View Full Text
- View PDF
Galaxy workflows for fragment-based virtual screening: a case study on the SARS-CoV-2 main protease

We present several workflows for protein-ligand docking and free energy calculation for use in the workflow management system Galaxy. The workflows are composed of several widely used open-source tools, includ...

Authors: Simon Bray, Tim Dudgeon, Rachael Skyner, Rolf Backofen, Björn Grüning and Frank von Delft

Citation: Journal of Cheminformatics 2022 14:22

Content type: Research article Published on: 12 April 2022
- View Full Text
- View PDF
Reproducible untargeted metabolomics workflow for exhaustive MS2 data acquisition of MS1 features

Unknown features in untargeted metabolomics and non-targeted analysis (NTA) are identified using fragment ions from MS/MS spectra to predict the structures of the unknown compounds. The precursor ion selected ...

Authors: Miao Yu, Georgia Dolios and Lauren Petrick

Citation: Journal of Cheminformatics 2022 14:6

Content type: Research article Published on: 16 February 2022
- View Full Text
- View PDF
Processing binding data using an open-source workflow

The thermal shift assay (TSA)—also known as differential scanning fluorimetry (DSF), thermofluor, and T_m shift—is one of the most popular biophysical screening techniques used in fragment-based ligand discovery (...

Authors: Errol L. G. Samuel, Secondra L. Holmes and Damian W. Young

Citation: Journal of Cheminformatics 2021 13:99

Content type: Educational Published on: 11 December 2021
- View Full Text
- View PDF
Semi-automated workflow for molecular pair analysis and QSAR-assisted transformation space expansion

In the process of drug discovery, the optimization of lead compounds has always been a challenge faced by pharmaceutical chemists. Matched molecular pair analysis (MMPA), a promising tool to efficiently extrac...

Authors: Zi-Yi Yang, Li Fu, Ai-Ping Lu, Shao Liu, Ting-Jun Hou and Dong-Sheng Cao

Citation: Journal of Cheminformatics 2021 13:86

Content type: Research article Published on: 13 November 2021
- View Full Text
- View PDF
Nonadditivity in public and inhouse data: implications for drug design

Numerous ligand-based drug discovery projects are based on structure-activity relationship (SAR) analysis, such as Free-Wilson (FW) or matched molecular pair (MMP) analysis. Intrinsically they assume linearity...

Authors: D. Gogishvili, E. Nittinger, C. Margreitter and C. Tyrchan

Citation: Journal of Cheminformatics 2021 13:47

Content type: Research article Published on: 2 July 2021
- View Full Text
- View PDF