inTB - a data integration platform for molecular and clinical epidemiological analysis of tuberculosis
1 Instituto Gulbenkian de Ciência, Rua da Quinta Grande 6, Apartado 14, Oeiras P-2781-901, Portugal
2 Present address: Instituto de Higiene e Medicina Tropical, Lisbon, Portugal
BMC Bioinformatics 2013, 14:264 doi:10.1186/1471-2105-14-264Published: 30 August 2013
Tuberculosis is currently the second highest cause of death from infectious diseases worldwide. The emergence of multi and extensive drug resistance is threatening to make tuberculosis incurable. There is growing evidence that the genetic diversity of Mycobacterium tuberculosis may have important clinical consequences. Therefore, combining genetic, clinical and socio-demographic data is critical to understand the epidemiology of this infectious disease, and how virulence and other phenotypic traits evolve over time. This requires dedicated bioinformatics platforms, capable of integrating and enabling analyses of this heterogeneous data.
We developed inTB, a web-based system for integrated warehousing and analysis of clinical, socio-demographic and molecular data for Mycobacterium sp. isolates. As a database it can organize and display data from any of the standard genotyping methods (SNP, MIRU-VNTR, RFLP and spoligotype), as well as an extensive array of clinical and socio-demographic variables that are used in multiple countries to characterize the disease. Through the inTB interface it is possible to insert and download data, browse the database and search specific parameters. New isolates are automatically classified into strains according to an internal reference, and data uploaded or typed in is checked for internal consistency. As an analysis framework, the system provides simple, point and click analysis tools that allow multiple types of data plotting, as well as simple ways to download data for external analysis. Individual trees for each genotyping method are available, as well as a super tree combining all of them. The integrative nature of inTB grants the user the ability to generate trees for filtered subsets of data crossing molecular and clinical/socio-demografic information. inTB is built on open source software, can be easily installed locally and easily adapted to other diseases. Its design allows for use by research laboratories, hospitals or public health authorities. The full source code as well as ready to use packages is available at http://www.evocell.org/inTB webcite.
To the best of our knowledge, this is the only system capable of integrating different types of molecular data with clinical and socio-demographic data, empowering researchers and clinicians with easy to use analysis tools that were not possible before.