Genomic variation in Salmonella enterica core genes for epidemiological typing
1 National Food Institute, Building 204, The Technical University of Denmark, 2800 Kgs Lyngby, Denmark
2 Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Building 208, DK-2800 Kgs Lyngby, Denmark
BMC Genomics 2012, 13:88 doi:10.1186/1471-2164-13-88Published: 12 March 2012
Technological advances in high throughput genome sequencing are making whole genome sequencing (WGS) available as a routine tool for bacterial typing. Standardized procedures for identification of relevant genes and of variation are needed to enable comparison between studies and over time. The core genes--the genes that are conserved in all (or most) members of a genus or species--are potentially good candidates for investigating genomic variation in phylogeny and epidemiology.
We identify a set of 2,882 core genes clusters based on 73 publicly available Salmonella enterica genomes and evaluate their value as typing targets, comparing whole genome typing and traditional methods such as 16S and MLST. A consensus tree based on variation of core genes gives much better resolution than 16S and MLST; the pan-genome family tree is similar to the consensus tree, but with higher confidence. The core genes can be divided into two categories: a few highly variable genes and a larger set of conserved core genes, with low variance. For the most variable core genes, the variance in amino acid sequences is higher than for the corresponding nucleotide sequences, suggesting that there is a positive selection towards mutations leading to amino acid changes.
Genomic variation within the core genome is useful for investigating molecular evolution and providing candidate genes for bacterial genome typing. Identification of genes with different degrees of variation is important especially in trend analysis.