Open Access Research article

Repeat-encoded poly-Q tracts show statistical commonalities across species

Kai Willadsen1, Minh Duc Cao1, Janet Wiles2, Sureshkumar Balasubramanian3 and Mikael Bodén1*

Author Affiliations

1 School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane QLD 4072, Australia

2 School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane QLD 4072, Australia

3 School of Biological Sciences, Monash University, Victoria 3800, Australia

For all author emails, please log on.

BMC Genomics 2013, 14:76  doi:10.1186/1471-2164-14-76

Published: 2 February 2013



Among repetitive genomic sequence, the class of tri-nucleotide repeats has received much attention due to their association with human diseases. Tri-nucleotide repeat diseases are caused by excessive sequence length variability; diseases such as Huntington’s disease and Fragile X syndrome are tied to an increase in the number of repeat units in a tract. Motivated by the recent discovery of a tri-nucleotide repeat associated genetic defect in Arabidopsis thaliana, this study takes a cross-species approach to investigating these repeat tracts, with the goal of using commonalities between species to identify potential disease-related properties.


We find that statistical enrichment in regulatory function associations for coding region repeats – previously observed in human – is consistent across multiple organisms. By distinguishing between homo-amino acid tracts that are encoded by tri-nucleotide repeats, and those encoded by varying codons, we show that amino acid repeats – not tri-nucleotide repeats – fully explain these regulatory associations. Using this same separation between repeat- and non-repeat-encoded homo-amino acid tracts, we show that poly-glutamine tracts are disproportionately encoded by tri-nucleotide repeats, and those tracts that are encoded by tri-nucleotide repeats are also significantly longer; these results are consistent across multiple species.


These findings establish similarities in tri-nucleotide repeats across species at the level of protein functionality and protein sequence. The tendency of tri-nucleotide repeats to encode longer poly-glutamine tracts indicates a link with the poly-glutamine repeat diseases. The cross-species nature of this tendency suggests that unknown repeat diseases are yet to be uncovered in other species. Future discoveries of new non-human repeat associated defects may provide the breadth of information needed to unravel the mechanisms that underpin this class of human disease.