Open Access Open Badges Research article

Genome-wide analysis of macrosatellite repeat copy number variation in worldwide populations: evidence for differences and commonalities in size distributions and size restrictions

Mireille Schaap1, Richard JLF Lemmers1, Roel Maassen1, Patrick J van der Vliet1, Lennart F Hoogerheide2, Herman K van Dijk3, Nalan Baştürk45, Peter de Knijff1 and Silvère M van der Maarel1*

Author Affiliations

1 Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands

2 Department of Econometrics & Tinbergen Institute, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands

3 Econometric Institute & Tinbergen Institute, Erasmus University Rotterdam & Vrije Universiteit Amsterdam, Amsterdam, The Netherlands

4 Econometric Institute, Erasmus University Rotterdam, Rotterdam, The Netherlands

5 The Rimini Centre for Economic Analysis, Rimini, Italy

For all author emails, please log on.

BMC Genomics 2013, 14:143  doi:10.1186/1471-2164-14-143

Published: 4 March 2013



Macrosatellite repeats (MSRs), usually spanning hundreds of kilobases of genomic DNA, comprise a significant proportion of the human genome. Because of their highly polymorphic nature, MSRs represent an extreme example of copy number variation, but their structure and function is largely understudied. Here, we describe a detailed study of six autosomal and two X chromosomal MSRs among 270 HapMap individuals from Central Europe, Asia and Africa. Copy number variation, stability and genetic heterogeneity of the autosomal macrosatellite repeats RS447 (chromosome 4p), MSR5p (5p), FLJ40296 (13q), RNU2 (17q) and D4Z4 (4q and 10q) and X chromosomal DXZ4 and CT47 were investigated.


Repeat array size distribution analysis shows that all of these MSRs are highly polymorphic with the most genetic variation among Africans and the least among Asians. A mitotic mutation rate of 0.4-2.2% was observed, exceeding meiotic mutation rates and possibly explaining the large size variability found for these MSRs. By means of a novel Bayesian approach, statistical support for a distinct multimodal rather than a uniform allele size distribution was detected in seven out of eight MSRs, with evidence for equidistant intervals between the modes.


The multimodal distributions with evidence for equidistant intervals, in combination with the observation of MSR-specific constraints on minimum array size, suggest that MSRs are limited in their configurations and that deviations thereof may cause disease, as is the case for facioscapulohumeral muscular dystrophy. However, at present we cannot exclude that there are mechanistic constraints for MSRs that are not directly disease-related. This study represents the first comprehensive study of MSRs in different human populations by applying novel statistical methods and identifies commonalities and differences in their organization and function in the human genome.

Tandem repeat sequences; DNA copy number variations; Population genetics; Bayes theorem