Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Semantic Web Applications and Tools for Life Sciences, 2008

Open Access Research

Francisella tularensis novicida proteomic and transcriptomic data integration and annotation based on semantic web technologies

Nadia Anwar13* and Ela Hunt2

Author Affiliations

1 Faculty of Biomedical and Life Sciences, University of Glasgow, Glasgow, G12 8QQ, UK

2 Department of Computer and Information Sciences, University of Strathclyde, Glasgow, G1 1XB, UK

3 Current address: Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York, New York 10065, USA

For all author emails, please log on.

BMC Bioinformatics 2009, 10(Suppl 10):S3  doi:10.1186/1471-2105-10-S10-S3

Published: 1 October 2009

Abstract

Background

This paper summarises the lessons and experiences gained from a case study of the application of semantic web technologies to the integration of data from the bacterial species Francisella tularensis novicida (Fn). Fn data sources are disparate and heterogeneous, as multiple laboratories across the world, using multiple technologies, perform experiments to understand the mechanism of virulence. It is hard to integrate these data sources in a flexible manner that allows new experimental data to be added and compared when required.

Results

Public domain data sources were combined in RDF. Using this connected graph of database cross references, we extended the annotations of an experimental data set by superimposing onto it the annotation graph. Identifiers used in the experimental data automatically resolved and the data acquired annotations in the rest of the RDF graph. This happened without the expensive manual annotation that would normally be required to produce these links. This graph of resolved identifiers was then used to combine two experimental data sets, a proteomics experiment and a transcriptomic experiment studying the mechanism of virulence through the comparison of wildtype Fn with an avirulent mutant strain.

Conclusion

We produced a graph of Fn cross references which enabled the combination of two experimental datasets. Through combination of these data we are able to perform queries that compare the results of the two experiments. We found that data are easily combined in RDF and that experimental results are easily compared when the data are integrated. We conclude that semantic data integration offers a convenient, simple and flexible solution to the integration of published and unpublished experimental data.