Open Access Highly Accessed Open Badges Research article

Inferring patient to patient transmission of Mycobacterium tuberculosis from whole genome sequencing data

Josephine M Bryant1, Anita C Schürch234, Henk van Deutekom5, Simon R Harris1, Jessica L de Beer2, Victor de Jager36, Kristin Kremer2, Sacha A F T van Hijum367, Roland J Siezen36, Martien Borgdorff58, Stephen D Bentley1, Julian Parkhill1* and Dick van Soolingen29

Author affiliations

1 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK

2 RIVM, Tuberculosis Reference Laboratory, National Institute for Public Health and the Environment (RIVM), Centre for Infectious Disease Control, (CIb/LIS, pb 22), P.O. Box 13720 BA, Bilthoven, The Netherlands

3 Radboud University Medical Centre/NCMLS, Centre for Molecular and Biomolecular Informatics, P.O. Box 91016500 HB, Nijmegen, The Netherlands

4 Department of Virology, Erasmus Medical Center, Rotterdam, The Netherlands

5 Department of tuberculosis control, Public Health Service, Amsterdam, The Netherlands

6 Netherlands Bioinformatics Centre (NBIC), P.O. Box 91016500HB, Nijmegen, The Netherlands

7 NIZO food research, P.O. Box 206710 BA, Ede, The Netherlands

8 Department of Clinical Epidemiology, Biostatistics, and Bioinformatics, Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands

9 Department of Clinical Microbiology and department of Lung Disease, Radboud University Nijmegen Medical Centre, P.O. Box 9101, 6500 HB Nijmegen, The Netherlands

For all author emails, please log on.

Citation and License

BMC Infectious Diseases 2013, 13:110  doi:10.1186/1471-2334-13-110

Published: 27 February 2013



Mycobacterium tuberculosis is characterised by limited genomic diversity, which makes the application of whole genome sequencing particularly attractive for clinical and epidemiological investigation. However, in order to confidently infer transmission events, an accurate knowledge of the rate of change in the genome over relevant timescales is required.


We attempted to estimate a molecular clock by sequencing 199 isolates from epidemiologically linked tuberculosis cases, collected in the Netherlands spanning almost 16 years.


Multiple analyses support an average mutation rate of ~0.3 SNPs per genome per year. However, all analyses revealed a very high degree of variation around this mean, making the confirmation of links proposed by epidemiology, and inference of novel links, difficult. Despite this, in some cases, the phylogenetic context of other strains provided evidence supporting the confident exclusion of previously inferred epidemiological links.


This in-depth analysis of the molecular clock revealed that it is slow and variable over short time scales, which limits its usefulness in transmission studies. However, the superior resolution of whole genome sequencing can provide the phylogenetic context to allow the confident exclusion of possible transmission events previously inferred via traditional DNA fingerprinting techniques and epidemiological cluster investigation. Despite the slow generation of variation even at the whole genome level we conclude that the investigation of tuberculosis transmission will benefit greatly from routine whole genome sequencing.

Mycobacterium tuberculosis; Molecular clock; Whole genome sequencing; Transmission; Epidemiology