Email updates

Keep up to date with the latest news and content from BMC Genetics and BioMed Central.

Open Access Highly Accessed Methodology article

PCA-based bootstrap confidence interval tests for gene-disease association involving multiple SNPs

Qianqian Peng1, Jinghua Zhao2* and Fuzhong Xue1*

Author Affiliations

1 Department of Epidemiology and Health Statistics, School of Public Health, Shandong University, Jinan 250012, PR China

2 MRC Epidemiology Unit, Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge CB2 0QQ, UK

For all author emails, please log on.

BMC Genetics 2010, 11:6  doi:10.1186/1471-2156-11-6

Published: 26 January 2010

Abstract

Background

Genetic association study is currently the primary vehicle for identification and characterization of disease-predisposing variant(s) which usually involves multiple single-nucleotide polymorphisms (SNPs) available. However, SNP-wise association tests raise concerns over multiple testing. Haplotype-based methods have the advantage of being able to account for correlations between neighbouring SNPs, yet assuming Hardy-Weinberg equilibrium (HWE) and potentially large number degrees of freedom can harm its statistical power and robustness. Approaches based on principal component analysis (PCA) are preferable in this regard but their performance varies with methods of extracting principal components (PCs).

Results

PCA-based bootstrap confidence interval test (PCA-BCIT), which directly uses the PC scores to assess gene-disease association, was developed and evaluated for three ways of extracting PCs, i.e., cases only(CAES), controls only(COES) and cases and controls combined(CES). Extraction of PCs with COES is preferred to that with CAES and CES. Performance of the test was examined via simulations as well as analyses on data of rheumatoid arthritis and heroin addiction, which maintains nominal level under null hypothesis and showed comparable performance with permutation test.

Conclusions

PCA-BCIT is a valid and powerful method for assessing gene-disease association involving multiple SNPs.