Institute for Medical Informatics, Statistics and Epidemiology, University of Leipzig, Haertelstrasse 16-18, 04107 Leipzig, Germany

LIFE Center (Leipzig Interdisciplinary Research Cluster of Genetic Factors, Phenotypes and Environment), University of Leipzig, Philipp-Rosenthal Strasse 27, 04103 Leipzig, Germany

Department of Medicine, University of Leipzig, Liebigstrasse 18, 04103 Leipzig, Germany

IFB Adiposity Diseases, University of Leipzig, Stephanstrasse 9c, 04103 Leipzig, Germany

Interdisciplinary Center for Clinical Research, University of Leipzig, Liebigstrasse 21, 04103 Leipzig, Germany

Dept Eco & Evo Biol, Interdepartmental Program in Bioinformatics, University of California, 621 Charles E. Young Dr South, Box 951606, Los Angeles, Los Angeles, CA 90095-1606 USA

Center for Society and Genetics. University of California, 1323 Rolfe Hall, Box 957221, Los Angeles, Los Angeles, CA 90095-7221, USA

Dept of History, University of California, 6265 Bunche Hall, Box 951473, Los Angeles, Los Angeles, CA 90095-1473, USA

Helmholtz Centre Munich, German Research Center for Environmental Health, Institute of Epidemiology, Ingolstaedter Landstraße 1, 85764 Neuherberg, Germany

Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, 04103 Leipzig, Germany

Institute of Medical Informatics, Biometry and Epidemiology, Chair of Epidemiology, Ludwig-Maximilians-University, Marchioninistraße 15, 81377 Munich, Germany

Klinikum Grosshadern, Ludwig Maximilians University, Marchioninistraße 15, 81377 Munich, Germany

Abstract

Background

The Sorbs are an ethnic minority in Germany with putative genetic isolation, making the population interesting for disease mapping. A sample of N = 977 Sorbs is currently analysed in several genome-wide meta-analyses. Since genetic differences between populations are a major confounding factor in genetic meta-analyses, we compare the Sorbs with the German outbred population of the KORA F3 study (N = 1644) and other publically available European HapMap populations by population genetic means. We also aim to separate effects of over-sampling of families in the Sorbs sample from effects of genetic isolation and compare the power of genetic association studies between the samples.

Results

The degree of relatedness was significantly higher in the Sorbs. Principal components analysis revealed a west to east clustering of KORA individuals born in Germany, KORA individuals born in Poland or Czech Republic, Half-Sorbs (less than four Sorbian grandparents) and Full-Sorbs. The Sorbs cluster is nearest to the cluster of KORA individuals born in Poland. The number of rare SNPs is significantly higher in the Sorbs sample. FST between KORA and Sorbs is an order of magnitude higher than between different regions in Germany. Compared to the other populations, Sorbs show a higher proportion of individuals with runs of homozygosity between 2.5 Mb and 5 Mb. Linkage disequilibrium (LD) at longer range is also slightly increased but this has no effect on the power of association studies.

Oversampling of families in the Sorbs sample causes detectable bias regarding higher FST values and higher LD but the effect is an order of magnitude smaller than the observed differences between KORA and Sorbs. Relatedness in the Sorbs also influenced the power of uncorrected association analyses.

Conclusions

Sorbs show signs of genetic isolation which cannot be explained by over-sampling of relatives, but the effects are moderate in size. The Slavonic origin of the Sorbs is still genetically detectable.

Regarding LD structure, a clear advantage for genome-wide association studies cannot be deduced. The significant amount of cryptic relatedness in the Sorbs sample results in inflated variances of Beta-estimators which should be considered in genetic association analyses.

Background

The Sorbs living in the Upper Lusatia region of Eastern Saxony are one of the few historic ethnic minorities in Germany. They are of Slavonic origin speaking a west Slavic language (Sorbian), and it is assumed that they have lived in ethnic isolation among the German majority during the past 1100 years

The value of isolated populations for the discovery of genetic modifiers of diseases or quantitative traits is discussed controversially

Nowadays, it is common practice to combine all available genotyped and phenotyped populations in large-scale, whole genome meta-analyses or pooled analyses in order to identify even very small genetic effects as commonly observed for complex traits. Spurious associations caused by the genetic sub-structures of combined populations are the most serious concern of this approach

Another characteristic feature of isolated populations is the putatively higher degree of cryptic relatedness in randomly drawn samples. This is a serious concern in genetic association analysis and needs to be addressed with appropriate statistical methods

The degree of isolation of the Sorbs has been studied in the past by the analysis of Y-chromosomal markers

Furthermore, we analyse how differences between populations can be translated to differences in power of genetic association studies within these samples. We analyse the influence of genetic effect size, LD structure, heritability, and relatedness on power.

Methods

Study Populations

Sorbs

The Sorbs are of Slavonic origin, and lived in ethnic isolation among the Germanic majority during the past 1100 years

KORA

The study population was recruited from the KORA/MONICA S3 survey, a population-based sample from the general population living in the region of Augsburg, Southern Germany, which was carried out in 1994/95. In a follow-up examination of S3 in 2004/05 (KORA F3), 3006 subjects participated. Recruitment and study procedures of KORA have been described elsewhere

HapMap

174 CEU (CEPH (Centre d'Etude du Polymorphisme Humain) from Utah) and 88 TSI (Toscans in Italy) samples were taken from a recent HapMap Collection (Public Release 27, NCBI build 36, The International HapMap Project). From the CEU sample, we removed 58 children, five individuals with call rate < 90% and one individual because of cryptic relatedness (NA07045 because of lower call-rate compared to NA12813

Data Analysis

Genotype Imputation and Quality Control

Missing genotypes of the KORA and Sorb samples were imputed separately using MACH Imputation Software with standard settings

After Imputation, we checked 471,012 autosomal SNPs in the overlap of the Affymetrix Human Mapping 500 K Array Set and Affymetrix Genome-Wide Human SNP Array 6.0 for quality.

SNPs with a call rate less than 95% in all four study populations combined, prior to imputation, were filtered (34,711 SNPs). Hardy-Weinberg-Equilibrium (HWE) was tested across populations using a stratified test proposed by ^{-6 }were eliminated. Finally, 14,508 SNPs showing unexpectedly high differences of allelic frequencies between genotyping platforms in the Sorbs sample were eliminated (p-value < 10^{-7}, see

Since several SNPs violated more than one of our criteria, we discarded a total of 46,536 SNPs and analysed 424,476 remaining SNPs.

For estimation of ROHs (see below) the number of analysed SNPs is reduced to 306,081 by matching SNPs on Affymetrix chips with available SNPs in the HapMap CEU and TSI samples. Due to the high sensitivity of the PCA (see below) we decided to tighten our quality criteria for this kind of analysis. Only SNPs with a call rate of at least 99% were included for PCA, which reduced the number of SNPs to 199,702.

An overview of the data pre-processing workflow can be found in Additional file

**Workflow of data pre-processing**. The workflow of data pre-processing is presented. We start with the autosomal SNP data of four different populations (KORA, Sorbs, HapMap CEU, HapMap TSI). Numbers of remaining markers at each step of pre-processing are presented in bold.

Click here for file

Estimation of Relatedness

Pair-wise relatedness between all individuals of KORA and Sorbs was estimated by the method described in

For analyses of dependence of measures of population genetic comparison on relatedness, we define two subsamples used for all subsequent analyses: For the first subsample, the complete Sorbs sample (Sorbs_{977}, N = 977) was matched with a randomly selected subset of N = 977 unrelated KORA subjects born in Germany (KORA_{977}). For the second subsample, a subset of N = 532 unrelated Sorbs (Sorbs_{532}) was matched with a subset of N = 532 KORA subjects (KORA_{532}) randomly selected from KORA_{977}.

Unrelated subjects were selected by an algorithm which implements a step-by-step removal of individuals showing the highest number of relationships to other members of the population until no pair of individuals with relatedness > 0.2 remained.

Principal components analysis

PCA is suitable to map genetic variance to a few dimensions expressing the highest degree of variance

Since PCA results are biased in case of unequal population sizes

PCA was done with iterative removal of outliers (default 5 iterations) and LD correction in consecutive SNPs (involving two previous SNPs as recommended in the manual of the EIGENSOFT package).

Rare SNPs

Isolated populations are supposed to have reduced genetic variability resulting in a higher number of rare SNPs. By definition, a SNP has a minor allelic frequency (MAF) of at least 1%. To account for variance we calculated the exact 95% confidence interval of the MAF and considered a SNP as rare if the interval was below one percent. This is equivalent to less than 11 observed alleles in Sorbs_{977 }or KORA_{977 }and less than five observed alleles in Sorbs_{532 }or KORA_{532 }respectively. The odds to find rare SNPs were compared between KORA and Sorbs using Fisher's exact test.

F-statistics

To characterize the variance of allelic frequencies within and between populations, we calculated F-statistics.

The inbreeding coefficient _{IS }

Correlation of alleles of individuals in the same population was estimated by the co-ancestry coefficient _{ST. }
_{ST }
_{ST }

Runs of homozygosity

Counting ROHs is useful to detect inbreeding

Linkage disequilibrium

In the Sorbs and KORA samples, we calculated pair-wise LD for all SNPs on Chromosome 22 (5382 markers) using robust estimators _{1}|, which is independent of allelic frequencies. Hence, it is especially useful when comparing populations _{1 }is a monotone function of the odds ratio

Its absolute value is the percentage of SNP pairs under the non-informative uniform distribution with less extreme LD than the one observed (see

Comparison of power assuming uncorrelated phenotypes

We analysed how the observed differences in LD structure between KORA and Sorbs can be translated into differences in power of genetic association studies. For this purpose, we assumed a linear regression model **y = **
_{1}
**s**
_{1 }+ **ε**
_{1 }of a random phenotype y which is influenced by a genotype **s**
_{1 }of a causative SNP, and **ε**
_{1 }is the residual Gaussian error of the model.

The SNP is assumed to explain a pre-specified proportion of the total variance of the phenotype which is denoted as _{1 }= 1 without restriction of generality. Within the distance of ± 2 Mb we now analysed the model **y **= _{2}
**s**
_{2 }+ **ε**
_{2 }for a second SNP, which is in maximum LD (measured by r) with the causative SNP. That is, we analysed the best proxy of the causative SNP rather than the causative SNP itself modelling the marker principle of genetic association studies. The estimator **s**
_{1}, **s**
_{2}, and

Where _{2i }
_{2 }= 0 using the above formula. This was done for all SNPs on Chromosome 22 in KORA_{977}, KORA_{532}, Sorbs_{977}, and Sorbs_{532 }. Distribution of power was derived using the results of all SNPs of Chromosome 22. Results were compared between the KORA and Sorbs samples of equal size.

**Derivation of the formula for
**.

Click here for file

Comparison of power assuming correlated phenotypes

In the previous section, we derived formulae for the estimation of power under the assumption of uncorrelated phenotypes. This approach applies for either a negligible relatedness structure of the individuals or a weak correlation of phenotypes of related individuals. Applying a GRAMMAR approach

However, to our knowledge, it is still not common practice in genome-wide association studies to use this approach to correct for relatedness. Therefore, we aim to study the situation in which the phenotypes are correlated but in which the corresponding individuals were analysed as independent even though they are not.

Following Amin **y **on the basis of the mixed model **y **= _{1}
**s**
_{1 }+ **g **+ **ε**
_{1}, comprising a fixed effect of genotypes **s**
_{1}, a random effect representing the residual polygenic effects **G **represents the pair-wise relatedness matrix. The model results in non-trivial covariance of phenotypes of different individuals. For each SNP we drew 1000 samples from the model and analysed the linear model **y **= _{2}
**s**
_{2 }+ **ε**
_{2 }for a second SNP which is in maximum LD to the first SNP in complete analogy to the procedure developed for uncorrelated phenotypes (see previous section). Different degrees of heritability **s**
_{1 }and **g**. Providing values for

Statistical Software and Web-Resources

HapMap data were downloaded from

All other calculations were performed using the Statistical Software package R (Version 2.8.0,

Results

For population genetic comparison of the Sorbian minority in Germany with the German KORA population, several measures of genetic isolation were applied to genome-wide SNP array data.

Relatedness

We analysed the relatedness of all 476,776 pairs of individuals in the Sorbs and all 1,350,546 pairs in the KORA samples. Results are shown in Figure

Distribution of degrees of relatedness in KORA and Sorbs

**Distribution of degrees of relatedness in KORA and Sorbs**. Distribution of degrees of relatedness in the KORA and Sorbs samples. For readability, the distribution of the 0.01% highest relatedness estimates of the KORA samples and the highest 0.5% estimates of the Sorbs samples are shown.

Distribution of pair-wise relatedness estimates

**Lower Bound**

**Number of pairs in KORA**

**Number of pairs in Sorbs**

**Odds ratio (KORA = reference category) [95% CI]**

0.1

79

1889

68 [54;86]

0.2

38

1186

88 [64;126]

0.4

24

666

79 [52;123]

0.6

1

1

3 [0;222]

Number of pair-wise relatedness estimates above a given boundary for a total of 476776 and 1350546 calculated pair-wise estimates in Sorbs and KORA, respectively. We also present the odds-ratio for an encounter of relatives and corresponding 95% confidence interval.

To achieve samples without pairs of individuals with relatedness-estimates greater than 0.2, it was necessary to exclude 445 Sorbs and 33 KORA individuals, resulting in subsamples of 532 Sorbs and 1,611 KORA individuals.

Principal components analysis

Results of PCA after removal of outliers and LD correction are shown in Figure

Principal components analysis of study populations

**Principal components analysis of study populations**. First two principal components of individuals from KORA born in Czech Republic (N = 50), Germany (N = 50), Poland (N = 50) and Full-Sorbs (N = 49), Half-Sorbs (N = 48), CEU (CEPH (Centre d'Etude du Polymorphisme Humain) from Utah, N = 49) and TSI (Toscans in Italy, N = 48).

A plot of the genetic variance represented by the first two principal components impressively reflects the geographic origin of these populations. TSI samples are relatively far away from the other clusters giving an orientation of a north to south axis. The KORA population is very close to the CEU HapMap population. In contrast, the Sorbian population clusters significantly eastwardly. There is a clear trend of west to east clustering of KORA individuals born in Germany, KORA individuals born in Poland or Czech Republic, Half-Sorbs, and finally, Full-Sorbs. The Sorbs clusters are nearest to the cluster of KORA individuals born in Poland.

Rare SNPs

When analysing 424,476 quality SNPs in 977 Sorbs (Sorbs_{977}) and the random Sample of 977 individuals from KORA (KORA_{977}), we counted 51,204 rare SNPs in Sorbs_{977 }and 49,721 rare SNPs in KORA_{977 }(p-value 6.7 × 10^{-7}). In the subset of 532 unrelated Sorbs (Sorbs_{532}) and the random sample of 532 unrelated individuals from KORA (KORA_{532}), we counted again more rare SNPs in the Sorbs_{532 }than in KORA_{532}, i.e. 49,257 and 47,913 (p-value 4.7 × 10^{-6}), respectively.

F-Statistics

Estimating _{IS }
_{977 }and KORA_{532 }resulted in slightly positive values with the smaller value in KORA_{977}. In contrast, in the samples Sorbs_{977 }and Sorbs_{532}, we find slightly negative values with smaller value in the sample Sorbs_{977}.

_{ST }
_{977 }and Sorbs_{977 }than between KORA_{532 }and Sorbs_{532}. _{ST }
_{IS }

Inbreeding and co-ancestry coefficients

**Population**

**F-statistic**

**Estimate**

**SE**

KORA_{977}

_{IS}

0.0012

2.7 × 10^{-4}

Sorbs_{977}

_{IS}

-0.0006

2.7 × 10^{-4}

KORA_{532}

_{IS}

0.0014

3.5 × 10^{-4}

Sorbs_{532}

_{IS}

-0.0002

3.6 × 10^{-4}

KORA_{977}, Sorbs_{977}

_{ST}

0.0034

5.4 × 10^{-5}

KORA_{532}, Sorbs_{532}

_{ST}

0.0029

6.7 × 10^{-5}

Estimates and standard errors (SE) of inbreeding coefficients _{IS }_{ST }

Runs of Homozygosity

ROHs were determined for the populations KORA, Sorbs_{977}, Sorbs_{532}, CEU, and TSI. Percentages of individuals in these populations containing at least one ROH in a specified length interval were calculated (Figure

Proportion of individuals with certain ROH length

**Proportion of individuals with certain ROH length**. Proportion of individuals from KORA (N = 1644), Sorbs_{977}, Sorbs_{532}, CEU (CEPH (Centre d'Etude du Polymorphisme Humain) from Utah, N = 110) and TSI (Toscans in Italy, N = 88) with at least one ROH in the given length interval.

In a second step, mean total length of ROHs with a given minimum length was estimated averaged over the individuals of each population (Figure _{532 }than for Sorbs_{977 }but the difference is small.

Average total length of ROHs

**Average total length of ROHs**. Average total length of ROHs for KORA (N = 1644), Sorbs_{977}, Sorbs_{532}, CEU (CEPH (Centre d'Etude du Polymorphisme Humain) from Utah, N = 110) and TSI (Toscans in Italy, N = 88) in dependence on minimal length of a single run.

Linkage Disequilibrium

Three measures of LD were calculated for KORA_{977}, KORA_{532}, Sorbs_{977}, and Sorbs_{532}. Results of _{1 }are shown in Figure

LD structure in KORA and Sorbs

**LD structure in KORA and Sorbs**. LD structure in the KORA_{977}, KORA_{532}, Sorbs_{977 }and Sorbs_{532 }samples. _{1 }was estimated for all SNP pairs of chromosome 22. Results are averaged over distance using bins of 5 kb length and smoothed by a LOWESS estimator.

As expected for KORA_{977 }and KORA_{532 }a small sample size bias can be observed. In contrast the estimators for Sorbs_{977 }and Sorbs_{532 }are virtually identical.

Comparison of power assuming uncorrelated phenotypes

The power to detect causal SNPs was calculated for KORA_{977}, KORA_{532}, Sorbs_{977}, and Sorbs_{532}. Results for SNP effects with explained variances of 2% or 5% can be found in Figure ^{-5 }and 1 × 10^{-7}.

Median power distribution in KORA and Sorbs

**Median power distribution in KORA and Sorbs**. Median power to detect SNP effects explaining 2% (left) or 5% (right) of variance, respectively. Power is plotted versus the p-value threshold. The grey lines are virtually covered by the black lines. The dotted line corresponds to p-value thresholds of 1 × 10^{-5 }and 1 × 10^{-7 }respectively.

Quartiles of power distribution assuming uncorrelated phenotypes

**Explained variance**

**p-value threshold**

**Population**

**1st Quartile**

**Median**

**3rd Quartile**

2%

1 × 10^{-5}

KORA_{977}

6.78

37.02

49.19

2%

1 × 10^{-5}

Sorbs_{977}

6.31

36.51

49.34

2%

1 × 10^{-5}

KORA_{532}

1.15

7.85

11.52

2%

1 × 10^{-5}

Sorbs_{532}

1.13

7.88

11.65

5%

1 × 10^{-7}

KORA_{977}

25.01

88.8

95.81

5%

1 × 10^{-7}

Sorbs_{977}

23.14

88.37

95.87

5%

1 × 10^{-7}

KORA_{532}

2.73

30.07

43.41

5%

1 × 10^{-7}

Sorbs_{532}

2.66

30.17

43.85

Quartiles of the power distribution in percent for an explained variance of 2% with a p-value threshold of 1 × 10^{-5 }and of 5% with a p-value threshold of 1 × 10^{-7}, respectively.

Comparison of power assuming correlated phenotypes

In Table _{977 }, there are only very small differences between Tables _{977 }the differences appear to be not substantial. For an explained variance of 2%, the power in Sorbs_{977 }increases, but it decreases for an explained variance of 5%. This is due to dependence on the significance threshold. Independent of the explained variance of the SNPs, the power under maximum heritability (100%) is greater than under minimal heritability (

Quartiles of power distribution assuming correlated phenotypes

**Explained variance**

**p-value threshold**

**Population**

**1st Quartile**

**Median**

**3rd Quartile**

2%

1 × 10^{-5}

KORA_{977}

6.7

37.1

48.4

2%

1 × 10^{-5}

Sorbs_{977}

10.08

38.95

48.9

2%

1 × 10^{-5}

KORA_{532}

1.2

7.8

11.6

2%

1 × 10^{-5}

Sorbs_{532}

1.3

8.2

11.9

5%

1 × 10^{-7}

KORA_{977}

24.78

88.3

95.12

5%

1 × 10^{-7}

Sorbs_{977}

27.3

83.6

91.8

5%

1 × 10^{-7}

KORA_{532}

2.73

29.9

42.9

5%

1 × 10^{-7}

Sorbs_{532}

2.9

30.4

43.5

Quartiles of the power distribution in percent for an explained variance of 2% with a p-value threshold of 1 × 10^{-5 }and of 5% with a p-value threshold of 1 × 10^{-7}, respectively. A heritability of 100% is assumed.

**Comparisons of power for Sorbs _{977 }for minimal and maximal heritability of phenotypes**. Simulation results of the power for minimal (

Click here for file

The explanation for this behaviour is the inflation of the variance of the _{977 }sample (see Additional file

**Variance inflation under relatedness**. Comparison of the theoretical variance of the _{1}-estimator assuming uncorrelated phenotypes (analytical formula _{977 }are presented in bold due to high inflation of variances of _{1}-estimates.

Click here for file

Results for other degrees of heritability are presented in Additional file

**Simulation results for power under assumption of correlated phenotypes**. Heritability was modified between ^{-5 }and 10^{-7}, respectively. All simulations were performed for KORA_{977}, Sorbs_{977}, KORA_{532}, and Sorbs_{532}. Power distribution is derived using the results of all SNPs of Chromosome 22.

Click here for file

Discussion

The Sorbs, resident in Lusatia, Germany, are an ethnic minority of Slavonic origin. Using genome-wide SNP array techniques, we aimed to compare this putatively isolated population with a German mixed population (KORA study) by various population genetic means. The Sorbs were compared recently with other European populations or isolates on the basis of a limited set of genetic markers and a limited set of unrelated individuals

Genotype data from a sample of 977 Sorbs were available from genotyping with 500 k and 1000 k Affymetrix SNP chips. While SNP markers come with certain drawbacks (ascertainment bias, need for careful QC), they have proven useful for detecting subtle population structures.

For comparison with a German mixed population, we used the KORA F3 sample (N = 1644) and corresponding genotypes from 500 k Affymetrix SNP chips. Observed differences between regions of Germany are typically an order of magnitude lower than differences observed between Sorbs and KORA

A major goal of our study was to distinguish effects of genetic isolation from simple over-sampling of families in the Sorbs. Since most of the population genetic measures used to compare populations assume independence of individuals, over-sampling of families in certain samples may introduce a source of bias which is difficult to control. Indeed, we discovered a large number of closely related individuals within the Sorbs sample. Therefore, we repeated all analyses for a sub-group of Sorbs for which all relationships with relatedness estimates greater than 0.2 were removed. This does not completely resolve the problem of increased relatedness within the Sorbs sample but provides a trend for potential biases introduced by over-sampling of families. Indeed, such biases could be detected in our data but it is not substantial at least for the population genetic measures studied.

Since relatedness cannot be completely removed from the samples, a cut-off of 0.2 for the relatedness estimate seems to be feasible to study the effect of relatedness and to keep the sample size at an acceptable level. We also studied a cut-off of 0.1 reducing the sample size to N = 414. Results can be found in Additional file

**Additional inbreeding and co-ancestry coefficients**. Estimates and standard errors (SE) of inbreeding coefficients _{IS }
_{ST }
_{977}, Sorbs_{977}), filtering for relatedness > 0.2 (KORA_{532}, Sorbs_{532}), filtering for relatedness > 0.1 (KORA_{414}, Sorbs_{414}). Indices refer to resulting numbers of cases.

Click here for file

For some analyses such as determination of rare SNPs and LD it is known that sample size can introduce bias

PCA is a proven means to detect even very small genetic differences between populations with high power. For European populations, it was demonstrated that the first two appropriately scaled principal components can map individuals to their geographic origin on the European continent with high precision, when all four grandparents are from the same location

We conclude that the Slavonic origin of the Sorbs is still clearly genetically detectable. The analysis revealed that there is a west to east sequence of the clusters of KORA individuals born in Germany, KORA individuals born in Poland or Czech Republic, Half-Sorbs, and finally, Full-Sorbs. Although birthplace is not a stringent indicator of ethnicity, it is a commonly used surrogate in genetic epidemiologic studies if more detailed information cannot be ascertained. On the other hand, most of the KORA individuals born in Poland or Czech Republic are descendents from German minorities of these countries. Hence, on the basis of our data we cannot conclude that the Sorbs are genetically more distant from Germany than a random sample from Poland or Czech Republic. Half-Sorbs can be assumed to be closer to the German population than Full-Sorbs due to mating with German neighbours. This is clearly reflected by the localization of Half-Sorbs between KORA individuals and Full-Sorbs. There is a trend that the Sorbs are closer to the KORA individuals born in Poland than to the KORA individuals born in Czech Republic which is in agreement with a recently stated hypothesis that the Sorbs are genetically closer to Polish than to Czech

Since it has been suggested that genetic diversity is lower in isolated populations

The _{ST }
_{IS }
_{IS }

ROH analysis was proposed to detect signs of isolation by estimation of inbreeding

We found that Sorbs have enriched ROHs of intermediate length (between 2.5 Mb and 5 Mb) compared to KORA, CEU, and TSI. This effect is much less pronounced for longer ROHs. Accordingly, the coverage of the genome by ROHs is higher in the Sorbian population. Following the argumentation of McQuillan et al., we conclude that there is a lack of recent parental relatedness in the Sorbs (no differences for long range ROHs) but that there are signs of ancient parental relatedness or the existence of autozygous segments of older pedigree structures (differences for ROHs of intermediate range). The lack of direct parental relatedness is in accordance with our estimates of _{IS}

Furthermore, we compared the LD structure of chromosome 22 between the KORA and the Sorbs population. We used the newly proposed LD measure _{1 }for the comparison of KORA and Sorbs. In contrast to the more popular measures _{1 }is independent of allelic frequencies

An expected small upward bias caused by smaller sample size in KORA_{532 }compared to KORA_{977 }could be clearly detected. In contrast, the results for Sorbs_{977 }and Sorbs_{532 }are virtually identical. We conclude that the expected upward bias of the reduced Sorbs_{532 }sample is nullified by the elimination of relationships. This interpretation is supported by the fact that a random sample of N = 532 individuals from Sorbs_{977 }resulted in the same sample size bias as observed for KORA (data not shown). That is, LD is upwardly biased by the relatedness structure in the Sorbs. Nevertheless, even if relationships are eliminated to a reasonable degree (first and second degree relationships), Sorbs show generally higher LD at longer distances than is observed in KORA. It has been already shown in the literature that LD excess at longer ranges is a characteristic of isolated populations

Since LD structure directly influences the coverage of a SNP technology, and with it, the power of genome-wide association studies, we performed power analyses in the Sorbs and KORA samples. For this purpose, we defined a fixed genetic effect of an arbitrary SNP at chromosome 22. Explained variance was used as a measure of effect in order to adjust for differences in allelic frequencies. For this SNP, we analysed the best proxy SNP available on chromosome 22 in order to mimic a situation in which an unobserved causative variant is detected via a marker in LD. We derived an analytical formula for our model for the case of negligible heritability for which individuals can be considered as independent. This formula also applies to situations where correction for relatedness effects has been performed, for instance with a GRAMMAR approach

Since relatedness structure is often neglected in genetic association studies, we also analysed the influence of present relatedness structure on the power of an uncorrected analysis. This analysis is done via simulations of a linear mixed model comprising a fixed effect of a SNP and random polygenetic and non-genetic effects. We showed that the variance of the _{977}, irrespective of the size of the genetic effect considered. The explanation is that normal distributions with different variances are overlapping.

We conclude that relatedness in the Sorbs_{977 }sample influences the power of uncorrected genetic association studies. Influence of relatedness on power is highest under maximum heritability of the phenotype. However, directions of power differences depend on the size of the genetic effect in combination with the significance threshold chosen.

In our simulations we did not observe a scenario resulting in a clear power benefit in the Sorbs_{977 }sample. However, this does not rule out that there might be a higher power in the Sorbs due to increased effect sizes caused, e.g., by higher environmental homogeneity or lower number of causative variants

Conclusions

We could show that there are signs of genetic isolation within the Sorbs which cannot be explained by over-sampling of relatives. The effects are moderate in size. The Slavonic origin of the Sorbs is still genetically detectable. Although there is higher LD in the Sorbs, the difference to KORA is small. Power analysis showed that a clear advantage of the Sorbs for genome-wide association studies with respect to coverage cannot be expected.

The significant amount of cryptic relatedness in the Sorbs sample results in inflated variances of

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

Design of the Study: MSch. Design of the Sorbs study and data collection: AT, PK, MStu. Design of the KORA data collection: CG, IR, HW. Data analysis: AG, NRR, MSch. Writing: AG, MSch. Contribution to writing and discussion: KRV, PA, ML, MSto, AT, PK, MStu, JN.

All authors read and approved the final manuscript.

Acknowledgements

We thank Knut Krohn and Beate Enigk for conducting microarray experiments of the Sorbs sample at the IZKF Leipzig at the Faculty of Medicine of the University of Leipzig (Projekt Z03).

We gratefully acknowledge the contributions of P. Lichtner, G. Eckstein, Guido Fischer, T. Strom and all other members of the Helmholtz Centre Munich genotyping staff in generating the SNP dataset as well as the contribution of all members of field staffs who were involved in the planning and conduct of the MONICA/KORA Augsburg studies. The KORA group consists of H.E. Wichmann (speaker), A. Peters, C. Meisinger, T. Illig, R. Holle, J. John and their co-workers who are responsible for the design and conduct of the KORA studies.

We thank Maelle Salmon for helping with data quality control. We thank Karsten Krug and Lars Thielecke for their technical assistance.

Finally, we express our appreciation to all participants of the Sorb and the KORA study for donating their blood and time.

Funding

The KORA research platform (KORA: Cooperative Research in the Region of Augsburg) and the MONICA Augsburg studies (Monitoring trends and determinants on cardiovascular diseases) were initiated and financed by the Helmholtz Zentrum München-National Research Center for Environmental Health, which is funded by the German Federal Ministry of Education, Science, Research and Technology and by the State of Bavaria. Part of this work was financed by the German National Genome Research Network (NGFN). Our research was supported within the Munich Center of Health Sciences (MC Health) as part of LMUinnovativ. AT, PK and MStu received financial support from the German Research Council (KFO-152), IZKF (B27), and the German Diabetes Association. MSto is funded by the Max Planck Society. AG and PA are funded by the German Federal Ministry for Education and Research (01KN0702). AG, PA, NRR, and MSch were funded by the Leipzig Interdisciplinary Research Cluster of Genetic Factors, Clinical Phenotypes, and Environment (LIFE Center, Universität Leipzig). LIFE is funded by means of the European Union, by the European Regional Development Fund (ERDF), the European Social Fund (ESF), and by means of the Free State of Saxony within the framework of its excellence initiative.