Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Research article

S3QL: A distributed domain specific language for controlled semantic integration of life sciences data

Helena F Deus12*, Miriã C Correa3, Romesh Stanislaus4, Maria Miragaia5, Wolfgang Maass6, Hermínia de Lencastre57, Ronan Fox1 and Jonas S Almeida8

Author Affiliations

1 Digital Enterprise Research Institute, National University of Ireland at Galway, IDA Business Park, Lower Dangan, Galway, Ireland

2 Biomathematics, Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Av. da República, Estação Agronómica Nacional, 2780-157 Oeiras, Portugal

3 Laboratório Nacional de Computação Ciêntifica, Av. Getúlio Vargas, 333,Quitandinha, 25651-075 Petrópolis, Brasil

4 Sanofi Pasteur, 38 Sidney Street, Cambridge, MA 02139, USA

5 Laboratory of Molecular Genetics, Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Av. da República, Estação Agronómica Nacional, 2780-157 Oeiras, Portugal

6 Research Center for Intelligent Media, Furtwangen University, Furtwangen, Germany

7 Laboratory of Microbiology, The Rockefeller University, 10021 New York, USA

8 Division of Informatics, Department of Pathology, University of Alabama at Birmingham, 619 South 19th Street, Birmingham, Alamaba, USA

For all author emails, please log on.

BMC Bioinformatics 2011, 12:285  doi:10.1186/1471-2105-12-285

Published: 14 July 2011

Abstract

Background

The value and usefulness of data increases when it is explicitly interlinked with related data. This is the core principle of Linked Data. For life sciences researchers, harnessing the power of Linked Data to improve biological discovery is still challenged by a need to keep pace with rapidly evolving domains and requirements for collaboration and control as well as with the reference semantic web ontologies and standards. Knowledge organization systems (KOSs) can provide an abstraction for publishing biological discoveries as Linked Data without complicating transactions with contextual minutia such as provenance and access control.

We have previously described the Simple Sloppy Semantic Database (S3DB) as an efficient model for creating knowledge organization systems using Linked Data best practices with explicit distinction between domain and instantiation and support for a permission control mechanism that automatically migrates between the two. In this report we present a domain specific language, the S3DB query language (S3QL), to operate on its underlying core model and facilitate management of Linked Data.

Results

Reflecting the data driven nature of our approach, S3QL has been implemented as an application programming interface for S3DB systems hosting biomedical data, and its syntax was subsequently generalized beyond the S3DB core model. This achievement is illustrated with the assembly of an S3QL query to manage entities from the Simple Knowledge Organization System. The illustrative use cases include gastrointestinal clinical trials, genomic characterization of cancer by The Cancer Genome Atlas (TCGA) and molecular epidemiology of infectious diseases.

Conclusions

S3QL was found to provide a convenient mechanism to represent context for interoperation between public and private datasets hosted at biomedical research institutions and linked data formalisms.

Keywords:
S3DB; Linked Data; KOS; RDF; SPARQL; knowledge organization system, policy