Open Access Research article

Simple sequence repeats in Helicobacter canadensis and their role in phase variable expression and C-terminal sequence switching

Lori AS Snyder12*, Nicholas J Loman1, James D Linton3, Rebecca R Langdon4, George M Weinstock5, Brendan W Wren4 and Mark J Pallen1

Author Affiliations

1 Centre for Systems Biology, School of Biosciences, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK

2 School of Life Sciences, Kingston University, Kingston upon Thames, KT1 2EE, UK

3 Faculty of Life Sciences, University of Manchester, Michael Smith Building, Oxford Road, Manchester, M13 9PT, UK

4 Pathogen Molecular Biology Unit, London School of Hygiene and Tropical Medicine, Keppel Street, London, WC1E 7HT, UK

5 The Genome Center, Washington University School of Medicine, St. Louis, Missouri, USA

For all author emails, please log on.

BMC Genomics 2010, 11:67  doi:10.1186/1471-2164-11-67

Published: 27 January 2010

Abstract

Background

Helicobacter canadensis is an emerging human pathogen and zoonotic agent. The genome of H. canadensis was sequenced previously and determined to contain 29 annotated coding regions associated with homopolymeric tracts.

Results

Twenty-one of the repeat-associated coding regions were determined to be potentially transcriptionally or translationally phase variable. In each case the homopolymeric tract was within the predicted promoter region or at the 5' end of the coding region, respectively. However, eight coding sequences were identified with simple sequence repeats toward the 3' end of the open reading frame. In these cases, the repeat tract would be too far into the coding region to be mediating translational phase variation. All of the 29 coding region-associated homopolymeric tracts display variability in tract length in the sequencing read data.

Conclusions

Twenty-nine coding regions have been identified in the genome sequence of Helicobacter canadensis strain NCTC13241 that show variations in homopolymeric tract length in the bacterial population, indicative of phase variation. Five of these are potentially associated with promoter regions, which would lead to transcriptional phase variation. Translational phase variation usually switches expression of a gene ON and OFF due to the repeat region being located sufficiently close to the initiation codon for the resulting frame-shift to lead to a premature termination codon and stop the translation of the protein. Sixteen of the 29 coding regions have homopolymeric tracts characteristic of translational phase variation. For eight coding sequences with repeats located later in the reading frame, changes in the repeat tract length would alter the protein sequence at the C-terminus but not stop the expression of the protein. This mechanism of C-terminal phase variation has implications for stochastic switching of protein sequence in bacterial species that already undergo transcriptional and translational phase variation.