Open Access Highly Accessed Research article

Comparison of the Legionella pneumophila population structure as determined by sequence-based typing and whole genome sequencing

Anthony P Underwood1*, Garan Jones13, Massimo Mentasti2, Norman K Fry2 and Timothy G Harrison2

Author Affiliations

1 Bioinformatics Unit, Microbiology Services (Colindale), Public Health England, 61 Colindale Avenue, London, NW9 5EQ, UK

2 Respiratory and Vaccine Preventable Bacteria Reference Unit, Microbiology Services (Colindale), Public Health England, 61 Colindale Avenue, London, NW9 5EQ, UK

3 Centre for Bioinformatics, Biosciences, College of Life and Environmental Sciences, Geoffrey Pope, University of Exeter, Stocker Road, Exeter, EX4 4QD, UK

For all author emails, please log on.

BMC Microbiology 2013, 13:302  doi:10.1186/1471-2180-13-302

Published: 24 December 2013



Legionella pneumophila is an opportunistic pathogen of humans where the source of infection is usually from contaminated man-made water systems. When an outbreak of Legionnaires’ disease caused by L. pneumophila occurs, it is necessary to discover the source of infection. A seven allele sequence-based typing scheme (SBT) has been very successful in providing the means to attribute outbreaks of L. pneumophila to a particular source or sources. Particular sequence types described by this scheme are known to exhibit specific phenotypes. For instance some types are seen often in clinical cases but are rarely isolated from the environment and vice versa. Of those causing human disease some types are thought to be more likely to cause more severe disease. It is possible that the genetic basis for these differences are vertically inherited and associated with particular genetic lineages within the population. In order to provide a framework within which to test this hypothesis and others relating to the population biology of L. pneumophila, a set of genomes covering the known diversity of the organism is required.


Firstly, this study describes a means to group L. pneumophila strains into pragmatic clusters, using a methodology that takes into consideration the genetic forces operating on the population. These clusters can be used as a standardised nomenclature, so those wishing to describe a group of strains can do so. Secondly, the clusters generated from the first part of the study were used to select strains rationally for whole genome sequencing (WGS). The data generated was used to compare phylogenies derived from SBT and WGS. In general the SBT sequence type (ST) accurately reflects the whole genome-based genotype. Where there are exceptions and recombination has resulted in the ST no longer reflecting the genetic lineage described by the whole genome sequence, the clustering technique employed detects these sequence types as being admixed, indicating their mixed inheritance.


We conclude that SBT is usually a good proxy for the genetic lineage described by the whole genome, and therefore utility of SBT is still suitable until the technology and economics of high throughput sequencing reach the point where routine WGS of L. pneumophila isolates for outbreak investigation is feasible.

Legionella pneumophila; Sequence-based typing; Whole genome sequencing; Clustering; Population structure; Recombination