Open Access Open Badges Methodology article

A general semi-parametric approach to the analysis of genetic association studies in population-based designs

Sharon Lutz134*, Wai-Ki Yip34, John Hokanson2, Nan Laird34 and Christoph Lange3456

Author affiliations

1 Department of Biostatistics, University of Colorado Anschutz Medical Campus, Aurora, USA

2 Department of Epidemiology, University of Colorado Anschutz Medical Campus, Aurora, USA

3 Department of Biostatistics, Harvard School of Public Health, Boston, USA

4 Channing Laboratory, Harvard Medical School, Boston, USA

5 Institute for Genomic Mathematics, University of Bonn, Bonn, Germany

6 , German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany

For all author emails, please log on.

Citation and License

BMC Genetics 2013, 14:13  doi:10.1186/1471-2156-14-13

Published: 28 February 2013



For genetic association studies in designs of unrelated individuals, current statistical methodology typically models the phenotype of interest as a function of the genotype and assumes a known statistical model for the phenotype. In the analysis of complex phenotypes, especially in the presence of ascertainment conditions, the specification of such model assumptions is not straight-forward and is error-prone, potentially causing misleading results.


In this paper, we propose an alternative approach that treats the genotype as the random variable and conditions upon the phenotype. Thereby, the validity of the approach does not depend on the correctness of assumptions about the phenotypic model. Misspecification of the phenotypic model may lead to reduced statistical power. Theoretical derivations and simulation studies demonstrate both the validity and the advantages of the approach over existing methodology. In the COPDGene study (a GWAS for Chronic Obstructive Pulmonary Disease (COPD)), we apply the approach to a secondary, quantitative phenotype, the Fagerstrom nicotine dependence score, that is correlated with COPD affection status. The software package that implements this method is available.


The flexibility of this approach enables the straight-forward application to quantitative phenotypes and binary traits in ascertained and unascertained samples. In addition to its robustness features, our method provides the platform for the construction of complex statistical models for longitudinal data, multivariate data, multi-marker tests, rare-variant analysis, and others.

Genetic associations studies; Secondary phenotypes; Case-control; Ascertainment; Semi-parametric