Gene Fusion Markup Language: a prototype for exchanging gene fusion data
1 Michigan Center for Translational Pathology, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
2 Department of Environmental Biotechnology, Bharathidasan University, Tiruchirappalli, India
3 Department of Pathology, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
4 Howard Hughes Medical Institute, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
5 Department of Urology, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
BMC Bioinformatics 2012, 13:269 doi:10.1186/1471-2105-13-269Published: 16 October 2012
An avalanche of next generation sequencing (NGS) studies has generated an unprecedented amount of genomic structural variation data. These studies have also identified many novel gene fusion candidates with more detailed resolution than previously achieved. However, in the excitement and necessity of publishing the observations from this recently developed cutting-edge technology, no community standardization approach has arisen to organize and represent the data with the essential attributes in an interchangeable manner. As transcriptome studies have been widely used for gene fusion discoveries, the current non-standard mode of data representation could potentially impede data accessibility, critical analyses, and further discoveries in the near future.
Here we propose a prototype, Gene Fusion Markup Language (GFML) as an initiative to provide a standard format for organizing and representing the significant features of gene fusion data. GFML will offer the advantage of representing the data in a machine-readable format to enable data exchange, automated analysis interpretation, and independent verification. As this database-independent exchange initiative evolves it will further facilitate the formation of related databases, repositories, and analysis tools. The GFML prototype is made available at http://code.google.com/p/gfml-prototype/ webcite.
The Gene Fusion Markup Language (GFML) presented here could facilitate the development of a standard format for organizing, integrating and representing the significant features of gene fusion data in an inter-operable and query-able fashion that will enable biologically intuitive access to gene fusion findings and expedite functional characterization. A similar model is envisaged for other NGS data analyses.