This article is part of the supplement: Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine 2010

Open Access Proceedings

Sublineage structure analysis of Mycobacterium tuberculosis complex strains using multiple-biomarker tensors

Cagri Ozcaglar1*, Amina Shabbeer1, Scott Vandenberg3, Bülent Yener1 and Kristin P Bennett12

Author Affiliations

1 Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY, USA

2 Mathematical Sciences Department, Rensselaer Polytechnic Institute, Troy, NY, USA

3 Computer Science Department, Siena College, Loudonville, NY, USA

For all author emails, please log on.

BMC Genomics 2011, 12(Suppl 2):S1  doi:10.1186/1471-2164-12-S2-S1

Published: 27 July 2011



Strains of Mycobacterium tuberculosis complex (MTBC) can be classified into major lineages based on their genotype. Further subdivision of major lineages into sublineages requires multiple biomarkers along with methods to combine and analyze multiple sources of information in one unsupervised learning model. Typically, spacer oligonucleotide type (spoligotype) and mycobacterial interspersed repetitive units (MIRU) are used for TB genotyping and surveillance. Here, we examine the sublineage structure of MTBC strains with multiple biomarkers simultaneously, by employing a tensor clustering framework (TCF) on multiple-biomarker tensors.


Simultaneous analysis of the spoligotype and MIRU type of strains using TCF on multiple-biomarker tensors leads to coherent sublineages of major lineages with clear and distinctive spoligotype and MIRU signatures. Comparison of tensor sublineages with SpolDB4 families either supports tensor sublineages, or suggests subdivision or merging of SpolDB4 families. High prediction accuracy of major lineage classification with supervised tensor learning on multiple-biomarker tensors validates our unsupervised analysis of sublineages on multiple-biomarker tensors.


TCF on multiple-biomarker tensors achieves simultaneous analysis of multiple biomarkers and suggest a new putative sublineage structure for each major lineage. Analysis of multiple-biomarker tensors gives insight into the sublineage structure of MTBC at the genomic level.