Ycasd – a tool for capturing and scaling data from graphical representations
1 Institute for Medical Informatics, Statistics and Epidemiology, University of Leipzig, Haertelstrasse 16-18, 04107 Leipzig, Germany
2 LIFE – Leipzig Research Center for Civilization Diseases, University of Leipzig, Philipp-Rosenthal-Strasse 27, 04103 Leipzig, Germany
BMC Bioinformatics 2014, 15:219 doi:10.1186/1471-2105-15-219Published: 25 June 2014
Mathematical modelling of biological processes often requires a large variety of different data sets for parameter estimation and validation. It is common practice that clinical data are not available in raw formats but are provided as graphical representations. Hence, in order to include these data into environments used for model simulations and statistical analyses, it is necessary to extract them from their presentations in the literature. For this purpose, we developed the freely available open source tool ycasd. After establishing a coordinate system by simple axes definitions, it supports convenient retrieval of data points from arbitrary figures.
After describing the general functionality and providing an overview of the programme interface, we demonstrate on an example how to use ycasd. A major advantage of ycasd is that it does not require a certain input file format to open and process figures. All options of ycasd are accessible through a single window which eases handling and speeds up data extraction. For subsequent processing of extracted data points, results can be formatted as a Matlab or an R matrix. We extensively compare the functionality and other features of ycasd with other publically available tools. Finally, we provide a short summary of our experiences with ycasd in the context of modelling.
We conclude that our tool is suitable for convenient and accurate data retrievals from graphical representations such as papers. Comparison of tools reveals that ycasd is a good compromise between easy and quick capturing of scientific data from publications and complexity. Our tool is routinely applied in the context of biological modelling, where numerous time series data are required to develop models. The software can also be useful for other kinds of analyses for which published data are required but are not available in raw formats such as systematic reviews and meta-analyses.