BMC Research Notes


Open Access Short Report

Evaluation of self-reported ethnicity in a case-control population: the stroke prevention in young women study

Jesse B Mez1, John W Cole1,2*, Timothy D Howard3, Leah R MacClellan4, Oscar C Stine4, Jeffery R O'Connell5, Marcella A Wozniak1,2, Barney J Stern1,2, John D Sorkin2, Braxton D Mitchell5 and Steven J Kittner1,2

Author Affiliations

1 Department of Neurology, University of Maryland School of Medicine, Baltimore, MD, USA

2 Medical Research Service, Veterans Affairs Medical Center, Baltimore, MD, USA

3 Department of Pediatrics, Center for Human Genomics, Wake Forest University School of Medicine, Winston-Salem, NC, USA

4 Department of Epidemiology and Preventative Medicine, University of Maryland School of Medicine, Baltimore, MD, USA

5 Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA

For all author emails, please log on.

BMC Research Notes 2009, 2:260 doi:10.1186/1756-0500-2-260

Published: 18 December 2009

Abstract

Background

Population-based association studies are used to identify common susceptibility variants for complex genetic traits. These studies are susceptible to confounding from unknown population substructure. Here we apply a model-based clustering approach to our case-control study of stroke among young women to examine if self-reported ethnicity can serve as a proxy for genetic ancestry.

Findings

A population-based case-control study of stroke among women aged 15-49 identified 361 cases of first ischemic stroke and 401 age-comparable control subjects. Thirty single nucleotide polymorphisms (SNPs) throughout the genome unrelated to stroke risk and with established ancestry-based allele frequency differences were genotyped in all participants. The Structure program was used to iteratively evaluate for K = 1 to 5 potential genetic-based subpopulations. Evaluating the population as a whole, the Structure output plateaued at K = 2 clusters. 98% of self-reported Caucasians had an estimated probability ≥50% of belonging to Cluster 1, while 94% of self-reported African-Americans had an estimated probability ≥50% of belonging to Cluster 2. Stratifying the participants by self-reported ethnicity and repeating the analyses revealed the presence of two clusters among Caucasians, suggesting that potential substructure may exist.

Conclusions

Among our combined sample of African-American and Caucasian participants there is no large unknown subpopulation and self-reported ethnicity can serve as a proxy for genetic ancestry. Ethnicity-specific analyses indicate that population substructure may exist among the Caucasian participants indicating that further studies are warranted.