This article is part of the supplement: Genetic Analysis Workshop 15: Gene Expression Analysis and Approaches to Detecting Multiple Functional Loci
Comparison of tagging single-nucleotide polymorphism methods in association analyses
1 Department of Health Sciences Research, Mayo Clinic College of Medicine, 200 First Street SW, Rochester, MN 55905, USA
2 Division of Medical Genetics, University of Washington, Box 357720, Seattle, WA 98195-7720, USA
BMC Proceedings 2007, 1(Suppl 1):S6 doi:Published: 18 December 2007
Several methods to identify tagging single-nucleotide polymorphisms (SNPs) are in common use for genetic epidemiologic studies; however, there may be loss of information when using only a subset of SNPs. We sought to compare the ability of commonly used pairwise, multimarker, and haplotype-based tagging SNP selection methods to detect known associations with quantitative expression phenotypes. Using data from HapMap release 21 on unrelated Utah residents with ancestors from northern and western Europe (CEPH-Utah, CEU), we selected tagging SNPs in five chromosomal regions using ldSelect, Tagger, and TagSNPs. We found that SNP subsets did not substantially overlap, and that the use of trio data did not greatly impact SNP selection. We then tested associations between HapMap genotypes and expression phenotypes on 28 CEU individuals as part of Genetic Analysis Workshop 15. Relative to the use of all SNPs (n = 210 SNPs across all regions), most subset methods were able to detect single-SNP and haplotype associations. Generally, pairwise selection approaches worked extremely well, relative to use of all SNPs, with marked reductions in the number of SNPs required. Haplotype-based approaches, which had identified smaller SNP subsets, missed associations in some regions. We conclude that the optimal tagging SNP method depends on the true model of the genetic association (i.e., whether a SNP or haplotype is responsible); unfortunately, this is often unknown at the time of SNP selection. Additional evaluations using empirical and simulated data are needed.