Institute of Informatics, University of Warsaw, ul. Banacha 2, 02-097, Warsaw, Poland

Mossakowski Medical Research Centre, Polish Academy of Sciences, ul. Pawińskiego 5, 02-106, Warsaw, Poland

Abstract

Background

Progress in the modeling of biological systems strongly relies on the availability of specialized computer-aided tools. To that end, the Taverna Workbench eases integration of software tools for life science research and provides a common workflow-based framework for computational experiments in Biology.

Results

The Taverna services for Systems Biology (Tav4SB) project provides a set of new Web service operations, which extend the functionality of the Taverna Workbench in a domain of systems biology. Tav4SB operations allow you to perform numerical simulations or model checking of, respectively, deterministic or stochastic semantics of biological models. On top of this functionality, Tav4SB enables the construction of high-level experiments. As an illustration of possibilities offered by our project we apply the multi-parameter sensitivity analysis. To visualize the results of model analysis a flexible plotting operation is provided as well. Tav4SB operations are executed in a simple grid environment, integrating heterogeneous software such as Mathematica, PRISM and SBML ODE Solver. The user guide, contact information, full documentation of available Web service operations, workflows and other additional resources can be found at the Tav4SB project’s Web page:

Conclusions

The Tav4SB Web service provides a set of integrated tools in the domain for which Web-based applications are still not as widely available as for other areas of computational biology. Moreover, we extend the dedicated hardware base for computationally expensive task of simulating cellular models. Finally, we promote the standardization of models and experiments as well as accessibility and usability of remote services.

Background

The Taverna Workbench

Taverna services come from a diverse set of life science domains. In the field of computational biology, the Taverna Workbench provides an access to services which are mainly related to the sequence annotation and analysis. Here, we present remote processors that extend Taverna’s functionality in the domain of systems biology, specifically, in the analysis of kinetic models of biological systems. Our hardware base offers computational resources sufficient for computationally demanding experiments, such as multiple invocations of the model-checking procedure. Essentially, the Taverna Workbench provides a convenient user interface for our WS operations. Without programming their own WS client, users can analyze the behavior of cellular systems under various conditions.

Features

For a given biochemical network model, the underlying mathematical model is determined by the chosen semantics. The most common representations are ordinary differential equations (ODEs) for the deterministic framework and continuous-time Markov chain (CTMC) for the framework

Operations provided by our Web server allow for:

1. numerical simulations for the deterministic formulation of a biochemical network model, using the SBML ODE Solver library (SOSlib)

2. probabilistic model checking of Continuous Stochastic Logic (CSL)

3. visualization of data series, such as ODEs trajectories or values of parametrized CSL properties, and probabilistic distribution sampling, using Mathematica

4. high-level analysis, such as multi-parameter sensitivity analysis (MPSA)

The SBML ODE Solver library enables numerical analysis of models encoded directly in Systems Biology Markup Language (SBML)

PRISM is one of the leading tools implementing probabilistic model checking, a technique of formal verification of systems that exhibit a stochastic behavior. A system to be analyzed is modeled as a Markov chain, and an examined property is expressed in a suitable probabilistic temporal logic. Some recent works, see e.g.

PRISM handles models defined in the PRISM input language. Currently, a prototype translator from SBML is not integrated into the application itself. Therefore, we also provided a separate operation to automatically translate from SBML to the PRISM language, using the prototype translator.

Finally, Wolfram’s Mathematica is a tool with one of the most advanced graphics engines among plotting software. Tav4SB provides Mathematica’s two- and three-dimensional list plots together with a versatile set of options for customizing their display. Additionally, Tav4SB allows to sample from the extensive collection of parametric probability distributions available in Mathematica.

Context

The aim of the Tav4SB project is to support the orchestration of physically scattered tools for execution of repeatable scientific experiments To understand a place of Tav4SB in a plethora of similar software, consider the following, mundane technical problem. You have a set of scripts, command line tools or any other form of legacy code, installed on one or more computational servers, not necessarily in the same local area network. For instance, you might have a Mathematica script which can be only executed on a server which has Mathematica installed on it; and simultaneously you might need to use PRISM, installed on a remote server with a large amount of required memory. You want to connect these tools in an

Tav4SB project is a realization of a minimalist approach to a platform-independent solution, based on the workflow management system and a service-oriented architecture built around the Web service standard and a straightforward queue of computational tasks.

Tav4SB project consists of two parts. The client part of the project (Tav4SB client) is a library of sample workflows and helper scripts for analysis of kinetic models of biological systems, using earlier described features. The server part of the project (Tav4SB server) is a simple grid environment which wraps aforementioned computational tools. Those tools are intended to be run in a multi-threaded manner, on one or more, possibly remote, computational servers.

As an utility for wrapping scientific software in Web services, the Tav4SB project enters premises of projects such as Soaplab2

Implementation

We have chosen the popular Systems Biology Markup Language (SBML)

Figure

**The implementation architecture**

**The implementation architecture.** Names of a particular software, technology or standard are written in blue. The communication type is specified on edges which connect components of the system. See text for details.

Client communicates with the server side via WS operations, using Simple Object Access Protocol (SOAP)

Java Web service classes were automatically generated from the WSDL file.

The WSDL file is hosted by the Apache Tomcat servlet container. It acts as a proxy between the client and the computational part of the server. A Web service operation call is translated into a Java Message Service (JMS)

Computational cluster management modules are written in Java using the Apache ActiveMQ implementation of the JMS standard. These modules are deployed as the Java Archive (JAR) files. The JMS messages are sent over TCP/IP, which basically makes modules independent of their physical location.

New tasks, created by the Web server module, are added to the tasks queue. At this point tasks are assigned to any available worker of a compatible type. Results are collected in a temporary queue, exclusive for a single WS operation call. Long-running tasks use an asynchronous call registry. In such case, direct (synchronous) response to the WS operation call is merely a message reporting the start of computations. The computed results are collected in a dedicated queue and, when completed, sent to a caller by email (using the JavaMail package).

Worker translates both a JMS task message into running computational processes and results of these processes back into a JMS result message. Each worker supports a specific type of computation and can communicate with an actual computational tool differently. Currently we implemented three types of workers: Mathematica worker which communicates with Mathematica via J/Link library, PRISM and odeSolver workers which communicate with, respectively, PRISM and SOSlib via a command-line interpreter (shell).

Results and Discussion

We constructed a set of exemplary workflows. Their main purpose is to demonstrate how Tav4SB WS operations can be used by the Taverna Workbench client. There are two kinds of workflows: Tav4SB WS operation wrappers and

Wrapper workflows illustrate a direct usage of Tav4SB operations in Taverna. Their purpose is to be re-used as nested workflows — building blocks of experiments described below. Additionally, we built a number of helper Taverna processors, used for interacting with XML-formatted inputs and outputs of WS operations.

In all our

The species names S, E, ES and P stand for substrate, enzyme, enzyme-substrate complex and product, respectively. Length of an arrow indicates the order of the reaction rate. Initial amounts of species and kinetic parameters values, taken from

Numerical ODEs simulations

The first workflow numerically simulates the ODEs of the model and plots resulting trajectories. ODEs are derived automatically from a SBML model file, based on rate laws of reactions. In the deterministic model of the enzymatic reaction, rates are described by the law of mass-action. As a result of running this simple experiment one gets time evolution of species concentrations in the form of both data points series and a plot.

Figure

**The “Simulate SBML-derived ODEs” workflow and resulting trajectories plot for the enzymatic reaction model (Eq. (1) and (2))**

**The “Simulate SBML-derived ODEs” workflow and resulting trajectories plot for the enzymatic reaction model (Equations (1) and (2)).** Pink boxes represent nested workflows, corresponding to Tav4SB WS operations wrappers and a helper. See text for more details.

Probabilistic model checking

The second experiment uses the probabilistic model checking technique to calculate the probability of a property to be satisfied, over a stochastic model of the enzymatic reaction (Equation (1)). The stochastic version is also encoded in the SBML format. The property being checked is expressed as the following reward-based CSL formula:

Roughly speaking, this formula answers the following question: how many times, on average, the reaction _{50} coefficient). The formula is evaluated for different enzyme initial amounts to find the enzyme’s optimal efficiency. As this is not an instantaneous computation and plotting usually requires many repeats to fine-tune a plot parameters, the experiment is divided into two separate parts: a computational part and a plotting part. Figure

The computational part of the “Probabilistic model checking of the SBML stochastic model” workflow and the resulting plot for the stochastic model of the enzymatic reaction (Equation (1))

**The computational part of the “Probabilistic model checking of the SBML stochastic model” workflow and the resulting plot for the stochastic model of the enzymatic reaction (Equation (1)).** Pink boxes represent nested workflows, corresponding to Tav4SB WS operations wrappers. See text for more details.

Multi-parameter sensitivity analysis

Sensitivity analysis investigates a relation between uncertain input or parameters of a model, and a property of an observable output

Biochemical reaction networks yield models of a nonlinear nature for which global sensitivity analysis methods (GSA) are the most suitable

1. Select parameters to assess.

2. Set parameters range.

3. Generate independent samples.

4. For each sample calculate the error (based on the output).

5. Classify samples as acceptable or unacceptable.

6. For each of the selected parameters compare the classified samples sets.

This procedure is depicted in Figure

The multi-parameter sensitivity analysis workflow with an ODE-based error function

**The multi-parameter sensitivity analysis workflow with an ODE-based error function.** Pink and brown boxes represent essential steps of the procedure. Remaining boxes represent workflow’s parameters and outputs. See text for details.

Calculating the error for each sample (Step 4) involves a separate analysis of the model. This is a factor that determines the running time of the MPSA procedure. We ran two variants of MPSA, differing in the way in which the error is calculated. In one variant we used ODEs simulations and in the other one we exploited the probabilistic model checking technique. We focused on kinetic parameters of two forward reactions of enzymatic reaction models (Equation (1)), i.e. _{1} and _{3}. As an error function we took, respectively, the mean squared error of an ODE trajectory of the product P and the absolute difference of the value of the formula (3), in both cases between results for a parameters sample and for the reference values of parameters (Equation (2)). In turn, we obtained empirical cumulative distribution functions (ECDF) of acceptable and unacceptable samples, for each of the selected parameters. ECDFs were compared using the Kolmogorov-Smirnov test (KS-test) and one minus the Pearson product–moment correlation coefficient (PMCC). As a final output of the MPSA method, we got two rankings for each of the sensitivity indices: KS-test and PMCC.

Figure _{1} and _{3}, for both variants of the MPSA procedure. In the variant based on ODEs simulations and the error function which measures changes in the product _{3} significantly dominates parameter _{1}, as far as sensitivity of the system is concerned. This is an expected result. Firstly, _{3} is a rate parameter of a reaction which is directly responsible for a product creation. Secondly, from the Michaelis-Menten approximation

one can expect that, for values from Equation (2), variation of parameter _{3} will be more influential, with respect to the product rate, than variation of parameter _{1}.

MPSA error surfaces, ECDFs and values of sensitivity indices for error calculated using deterministic model with the mean squared error of product trajectories (left column) and using stochastic model with the absolute difference of a value of the formula (3) (right column)

**MPSA error surfaces, ECDFs and values of sensitivity indices for error calculated using deterministic model with the mean squared error of product trajectories (left column) and using stochastic model with the absolute difference of a value of the formula (3) (right column).** Both procedures were run for 400 samples of parameters. Samples were generated using the Latin hypercube sampling method

Interestingly, the results of the other variant of the MPSA procedure are significantly different; one observes that now _{1} dominates _{3}. This may be ascribed to the particular choice of the formula (3) which calculates the average number of occurrences of the first reaction _{1}. Furthermore, an inspection of values of sensitivity indices given in Figure

MPSA combined with PMC may be applied as a pre-processing step which finds parameters that are insignificant for an analysis oriented on a very specific property of a model. This would provide a novel notion of a probabilistic abstraction

Performance test

To measure the network load and the overhead of the task management in Tav4SB server we ran a performance test. The test was set up with the MAPK cascade case study from the PRISM Web page

The stochastic MAPK model (see Figure

A scheme of the MAPK cascade model and a list of verified properties from the PRISM Web page

**A scheme of the MAPK cascade model and a list of verified properties from the PRISM Web page **

Table

**# of threads/machines**

**1**

**2**

**4**

**8**

**14**

Table cells contain the average longest computation time in minutes, in different configurations of a number of machines and a number of threads for each worker.

**1**

271,25

137,84

71,22

38,06

23,85

**2**

149,09

71,12

38,00

21,51

14,55

**4**

124,53

54,37

26,06

13,68

9,75

**8**

70,44

37,53

23,78

17,39

16,44

Conclusions

Web-based applications are still not as widely available for the systems biology domain as for other research areas

Our services extend the functionality of the Taverna Workbench in the field of systems biology. Together with the services we provide a hardware base for our minimalist grid environment. The grid itself can, and will be, easily extended, independently of a physical location of peripherals and independently of an operating system they are running. Moreover, our grid facilitates integration of heterogeneous tools, such as Mathematica, PRISM or SOSlib. The end-user goal of the Tav4SB project is to abstract details of the technological infrastructure. Finally, via SBML and the Taverna Workbench, we would like to promote standardization of models and experiments as well as accessibility of services and their usability for non-programmers. In order to further enhance the usability, we released the source code of the project so that users can extended the Tav4SB functionality with their own workers modules. Users with programming skills can contribute to the development of the technical aspects of the server part of the project. These aspects cover the plug-in architecture of workers, the library of legacy code connectors (e.g., currently used, command-line interface or Java library), descriptors for the automatic generation of the workers code for common types of wrapped applications (cf. ACD metadata files in the Soaplab2 project

From the point of view of

Availability and requirements

• Project name: Tav4SB

• Project home page:

• Operating system(s): Platform independent (both client and server parts)

• Programming language: Optionally, SCUFL/t2flow, BeanShell, XSLT (client) and Java, Mathematica, Bash (server)

• Other requirements: the Taverna Workbench client 2.3 or higher, JSBML 0.8-b2, plus, optionally, any files hosting Web server (client) and Apache Tomcat 6.0 series, Apache Maven 2 or higher, plus, optionally, Mathematica 7.0 or higher, PRISM 4.0 series and SBML ODE Solver 1.6 (server)

• License: GNU AGPL

• Any restrictions to use by non-academics: None

Please note that, technically, SCUFL and t2flow are workflow description languages, but together with the graphical notation provided by the Taverna Workbench they can be seen as visual programming languages. These and other client dependencies on a programming language are optional because one can write their own WS client in virtually any language. Also, be advised that the Apache Maven tool (in other requirements) automatically resolves all dependencies on Java libraries, such as JavaMail or Apache ActiveMQ (cf. Figure

The definition of operations provided by Tav4SB WS plus workflows files, together with installation and execution instructions are available from the project’s home page. Documentation of the Tav4SB WS can be found in BioCatalogue

Client workflows were tested on Ubuntu Linux (10.10), Mac OS X (10.6.8) and Windows Vista (Business) operating systems. The production server is currently deployed on computational servers at the Faculty of Mathematics, Informatics and Mechanics of the University of Warsaw (running Ubuntu Linux Server, Gentoo Linux and PLD Linux). The performance test server was deployed on a cluster of Ubuntu Linux machines (workers and queue) and Solaris gateway (WS). A local developer’s environment, with both client and server, was deployed and tested on Ubuntu Linux (10.10) and Mac OS X (10.6.8).

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

MR performed experimental parts, implemented the system and prepared the final version of the manuscript. ML implemented the grid structure of the system and performed performance tests. PB did the initial implementation. AG and SL supervised this project and participated in drafting the manuscript. All authors have read and approved the final manuscript.

Acknowledgements

This work was partially supported by the Polish government grant N N206 356036, and by the Biocentrum Ochota project (POIG.02.03.00-00-003/09). The first author is a scholar within the Human Capital Operational Programme financed by the European Social Fund and state budget. This paper was written for the benefit of University of Zielona Góra.

EU logotypes We would like to thank to Janusz Dutkowski (Departments of Medicine and Bioengineering, University of California San Diego) for helpful comments on the manuscript.