Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

VarioML framework for comprehensive variation data representation and exchange

Myles Byrne1, Ivo FAC Fokkema2, Owen Lancaster3, Tomasz Adamusiak4, Anni Ahonen-Bishopp5, David Atlan6, Christophe Béroud7, Michael Cornell8, Raymond Dalgleish3, Andrew Devereau8, George P Patrinos9, Morris A Swertz10, Peter EM Taschner2, Gudmundur A Thorisson3, Mauno Vihinen111213, Anthony J Brookes3 and Juha Muilu1*

Author affiliations

1 Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland

2 Department of Human Genetics, Leiden University Medical Center, Leiden, Netherlands

3 Department of Genetics, University of Leicester, Leicester, UK

4 Medical College of Wisconsin, Milwaukee, WI, USA

5 Biocomputing Platforms, Ltd, Espoo, Finland

6 Phenosystems Inc, Brussels, Belgium

7 INSERM UMR_S910, Faculté de Médecine La Timone, Marseille, France

8 National Genetics Reference Laboratory, Manchester, UK

9 Department of Pharmacy, School of Health Sciences, University of Patras, Patras, Greece

10 Department of Genetics, Genomics Coordination Center University Medical Center Groningen and Groningen Bioinformatics Center, University of Groningen, Groningen, Netherlands

11 Department of Experimental Medical Science, Lund University, Lund, Sweden

12 Institute of Biomedical Technology, University of Tampere, Tampere, Finland

13 Tampere University Hospital, Tampere, Finland

For all author emails, please log on.

Citation and License

BMC Bioinformatics 2012, 13:254  doi:10.1186/1471-2105-13-254

Published: 3 October 2012

Abstract

Background

Sharing of data about variation and the associated phenotypes is a critical need, yet variant information can be arbitrarily complex, making a single standard vocabulary elusive and re-formatting difficult. Complex standards have proven too time-consuming to implement.

Results

The GEN2PHEN project addressed these difficulties by developing a comprehensive data model for capturing biomedical observations, Observ-OM, and building the VarioML format around it. VarioML pairs a simplified open specification for describing variants, with a toolkit for adapting the specification into one's own research workflow. Straightforward variant data can be captured, federated, and exchanged with no overhead; more complex data can be described, without loss of compatibility. The open specification enables push-button submission to gene variant databases (LSDBs) e.g., the Leiden Open Variation Database, using the Cafe Variome data publishing service, while VarioML bidirectionally transforms data between XML and web-application code formats, opening up new possibilities for open source web applications building on shared data. A Java implementation toolkit makes VarioML easily integrated into biomedical applications. VarioML is designed primarily for LSDB data submission and transfer scenarios, but can also be used as a standard variation data format for JSON and XML document databases and user interface components.

Conclusions

VarioML is a set of tools and practices improving the availability, quality, and comprehensibility of human variation information. It enables researchers, diagnostic laboratories, and clinics to share that information with ease, clarity, and without ambiguity.

Keywords:
LSDB; Variation database curation; Data collection; Distribution