Abstract
Background
It is well known that the presence of population stratification (PS) may cause the usual test in casecontrol studies to produce spurious genedisease associations. However, the impact of the PS and sample selection (SS) is less known. In this paper, we provide a systematic study of the joint effect of PS and SS under a more general risk model containing genetic and environmental factors. We provide simulation results to show the magnitude of the bias and its impact on type I error rate of the usual chisquare test under a wide range of PS level and selection bias.
Results
The biases to the estimation of main and interaction effect are quantified and then their bounds derived. The estimated bounds can be used to compute conservative pvalues for the association test. If the conservative pvalue is smaller than the significance level, we can safely claim that the association test is significant regardless of the presence of PS or not, or if there is any selection bias. We also identify conditions for the null bias. The bias depends on the allele frequencies, exposure rates, geneenvironment odds ratios and disease risks across subpopulations and the sampling of the cases and controls.
Conclusion
Our results show that the bias cannot be ignored even the case and control data were matched in ethnicity. A real example is given to illustrate application of the conservative pvalue. These results are useful to the genetic association studies of main and interaction effects.
Background
In the search of causative agents of human disease, both environmental and genetic risk factors have been identified. Overwhelming evidence indicates that there are reasons to believe that relative common polymorphisms in a wide spectrum of genes may modify the effect of environmental agents [1,2]. Several studies also have demonstrated the presence of genegene interaction in complex human diseases [37]. Genegene interaction, or epistasis, is also considered as a basic genetic concept which has been widely used by biologists for a long time [8].
Many association designs have been proposed for studying geneenvironment or genegene interactions. Recently, Wang and Zhao [9] found that in the study of genegene interactions, the unmatched casecontrol association design is more powerful than both the matched casecontrol design and caseparents design. They also found that when a logistic regression model is fitted for assessing geneenvironment interactions based on caseparents sample, the approach may be susceptible to the PS bias [10]. However, casecontrol design is also well known to be susceptible to the PS bias in the study of genetic effect, if the gene under study shows marked variation in allele frequency across subgroups of the population and if these subgroups also differ in their baseline disease risks [1117]. Wang, et al. [18] recently provided numerical examples showing that when the correlation between genetic and environmental factors is small or the linkage disequilibrium is weak, and casecontrol data were collected according to a simple random sampling (SRS) scheme, that is no selection bias, the PS bias in testing null interaction odds ratio is also small. However, selection bias often occurs in casecontrol studies and more studies are needed in order to better understand the impact of the PS and SS.
In this paper, we investigate the joint effect of population stratification and sample selection in testing null main or interaction effects. Under general sampling, we quantify the magnitude of the PSSS bias in terms of the baseline disease risks, genotype frequencies, exposure rates, their odds ratios (linkage disequilibrium coefficients), and the effect sizes of the risk factors. Based on this result, we find that matching in ethnicity cannot eliminate bias in association studies. Using the bias, we are also able to derive important conditions under which it is null.
The PSSS bias cannot be estimated, since we don't know how many subpopulations involved in the studied population and/or which subpopulation a person belongs to. Although adjusting for covariates such as principal components can be used to account for PS in genome wide association studies [19], however, it is not clear whether the same approach can be applied in the studies of interaction. Since, for example, the bias level also depends on the effect size of the environmental factor. In this paper, we also derive useful bounds to measure the maximal impact of the bias. Sometimes, these bounds can be estimated so that tests robust to the joint effect of PS and SS can be derived; see Lee and Wang [20] for similar suggestion in studies of genedisease association. We use theoretical formula and simulation results to show the general properties of the usual association test in the presence of PS or selection bias. We also provide a real example to demonstrate computation of a conservative pvalue in studying interaction effect of maternal smoking and GSTT1 variant on the risk of orofacial cleft.
Results
The Magnitude of the Bias
We begin this section with the notation that will be used throughout this work. Disease status is denoted as D with levels D = 1, and 0, indicating the presence and absence of the disease, respectively. Let G = 1(0) represent the presence (absence) of the genotype of interest. H = 1(0) represents the presence (absence) of the environmental exposure or another genotype of interest. Although we only focus on 2 × 2 × 2 table, however, all results can be extended to any number of risk factors or any number of levels. We also assume that the population under study consists of K subpopulations and denote S as the stratification variable, taking values s = 1,..., K. However, K is unknown and S is not observable in our discussion of the PS effect.
To quantify the PS effect, we assume that the risk model is given by
where the genetic and environmental data are obtained from subpopulation s. As usual, we use s = 1, g = 0, and h = 0 to represent the referent subpopulation, genotype and environmental exposure, respectively. For the purpose of identifiability, we define = 0. s = 1,..., K, are the subpopulationspecific parameters representing the potential heterogeneity of disease risk across subpopulations. In this model, logoddsratio β measures the association between the genotype and risk of disease, logoddsratio γ measures the association between the environmental exposure (or another genotype) and risk of disease. The multiplicative interaction δ measures the change of the diseasegenotype logoddsratios according to different levels of risk factor H. Similar risk models for studying genetic effect under PS can be found in Satten et al. [21] and Cheng and Lin [17], for examples. For subpopulation s, we use OR_{s }to represent the baseline GH odds ratio (given D = 0). Define
as the baseline G frequency odds and baseline H frequency odds H_{s }is similarly defined. Also define D_{s }as the baseline disease frequency odds given by
In the discussion of PS effect, one often assumes that case and control data are sampled according to the SRS design. Let P(S = sD = 1) and P(S = sD = 0) represent the corresponding proportions of subpopulation s in the cases and controls, respectively. However, in real applications, selection bias often happens and sampling may not be done according to the SRS scheme for various reasons. Let the true proportion of subjects in the cases (controls) that are from subpopulation s be denoted by P^{#}(S = sD = 1) (P^{#}(S = sD = 0)). We use DS_{s }= to measure the effect of the sample selection for subpopulation s. If there is no selection bias, DS_{s }= 1.
Since in the population level we only observe factors G and H, we show in the Methods section that given the presence of PS and general sampling, the main effects and interaction are given by
where
and
exp(β*), exp(γ*) and exp(δ*) are the bias levels. We note that if D_{s}DS_{s }is a constant with respect to s, then K(g, h)is also a constant and there is no bias of any kind. A sufficient condition for this to hold is when the baseline disease risk is identical across all subpopulations and sampling of the study follows a SRS design. Further, since
therefore, if the disease prevalence P(D = 1S = s) and baseline disease risk P(D = 1G = H = 0,S = s) are approximately equal in each subpopulation, then bias depends on D_{s}DS_{s }only through the degree of matching . Accordingly, if the case and control are matched in ethnicity, then the bias should be very small. However, P(D = 1S = s) ≈ P(D = 1G = H = 0,S = s) for all subpopulations is often not true when environmental factor, such as smoking, are involved in causing the disease risk. Under this scenario, even the cases and controls are perfectly matched, the bias can still be large. This conclusion is different from that under the genedisease association study; see for example, Cheng, Lee and Chen [22]. We shall see more discussion of this issue in latter sections.
Maximal bias and conditions for the null bias
Here, we give conditions for the null bias and bounds for bias. The bias exp(β*) to the estimation of genetic main effect depends on the variation of the genotype frequencies measured byvariation of the disease prevalence measured by and the sampling variation measured by The bias exp(δ*) to the estimation of interaction depends additionally on the variation of the baseline odds ratio, measured by and the variation of exposure rates measured by
Note that the bias β* depends only on K(g, 0). We first present some conditions for the null bias β* = 0, when the true genetic main effect is null: (1) if the baseline genotype frequency is constant across subpopulations, then the bias β* is zero (can be proved using equation (1) in the Methods section); (2) if the sample selection follows a SRS scheme (DS^{† }= 1), and the disease risk is constant, then the bias is also null. (However, if the sampling is not SRS, the bias may be nonnull; see Tables 1 and 2.); (3) if the case and control data are matched in ethnicity, and γ = δ = 0 (both Hmain effect and interaction are null), then the bias is null.
Table 1. Biases and the true type I errors of the chisquare tests when G^{† }= 5 and LD = (0,0)
Table 2. Biases and true type I errors of the chisquare tests when G^{† }= 5 and LD = (0,0.05)
When the interaction effect is null, some conditions for the null bias δ* = 0 are: (1) if the baseline GH odds ratios and G(or H) frequency odds are constant across subpopulations, then the bias δ* is null (can be proved using equation (2) in the Methods section); (2) if the sample selection of the study follows SRS, and the disease risk is constant, then the bias δ* is also null. However, see Tables 1 and 2 for the presence of bias when the SRS condition fails.
Next, we present bound to measure the largest bias to the estimation of main effect. In the Methods section, we show that the bias exp(β*) can be expressed as
where w_{s }are some constants satisfying 0 ≤ w_{s }≤ 1 and . The bias is the greatest when the number of subpopulations is 2. The bias is also bounded below by. These bounds give the maximal impact of the bias in making inference about the genetic main effect. Under rare disease, the background disease rate is approximately equal to the background disease odds. We find that the bound under SRS (DS^{† }= 1) is similar to that given by Lee and Wang [19]. However, our result is more general in the sense that their risk model was a special case of ours and selection bias was not considered in their paper either.
In the Methods section, we also showed that under SRS, the bias exp(δ*) was bounded above by (D^{† })^{2 }and bounded below(D^{† })^{2}. These are the same bounds derived by Wang et al. [18]. Unfortunately, these bounds are not valid when there is selection bias. Under the general sample selection, we showed that the bias exp(δ*) was bounded above by
and bounded below by . Using these bounds we can easily conclude that if the genetic factors are in linkage equilibrium within each subpopulation, and the variation of the G (or H) frequency odds is small then the bias is also expected to be small.
True type I errors
In casecontrol studies, one often expects that the type I errors of the association tests can be approximately controlled at some predetermined level. However, in the presence of PS or selection bias, the usual test statistic does not have a chisquare distribution under the null hypothesis. Instead, it has a noncentral chisquare distribution, with noncentrality parameter depending on the level of the bias. Thus, the usual chisquare test tends to have inflated type I errors.
Suppose that the intended type I error rate of the chisquare test is α and let represent the 100(1α) percentile of the chisquare distribution with one degree of freedom. Let represent a noncentral chisquare random variable with one degree of freedom and noncentrality parameter Δ. In the case of testing null interaction, the noncentrality parameter is given by
where is number of observations with outcome G = g, H = h and disease status d. Then the true type I error of the usual chisquare test of null interaction is given by which is always ≥ α. In the case of testing null genetic main effect, the noncentrality parameter is given by
The corresponding true type I error of the chisquare test is given by which is also ≥ α.
Conservative pvalues
In most practical applications, one often does not know the true value of the noncentrality parameter and therefore it is difficult to calculate the true pvalue of the chisquare test when the PS is present and/or there is selection bias. However, we are able to develop a bound for the noncentrality parameter, and the latter may be estimable in many cases. Define () as Δ_{δ}(Δ_{β}) but with δ* (β*) replaced by its upper bound (logU_{β}). Let () be the usual statistic for testing null interaction (main effect). Then following Cheng, Lee and Chen [22], a conservative pvalue of the chisquare test is given by (). We note that by using the property of noncentral chisquare distribution, the test based on using conservative pvalue always have true type I error rate smaller than or equal to the significance level and the latter is always smaller than or equal to the true type I error rate of the usual chisquare test. If a test has conservative pvalue less than or equal to the designated significance level, it is significant even there is PS or selection bias.
Examples of true biases and type I error rates
Tables 1 and 2 show some values of the biases β* and δ* and true type I error rates α_{β }and α_{δ }of the usual chisquare tests when the significance level is 0.05. We assumed that there are two subpopulations (K = 2), β = δ = 0, γ = 0 or 1. G (H) frequency of the first subpopulation was given by P(G = 1S = 1) = 0.51 (P(H = 1S = 1) = 0.19), the first subpopulation disease risk was P(D = 1S = 1) = 0.05, the proportion of subpopulation 1 in the overall population was 0.7, and case and control sample sizes both equaled to n = 500. We defined LD_{s }= (LD_{1}, LD_{2}) where LD_{s }was the linkage disequilibrium coefficient between loci G and H in subpopulation s, and considered linkage disequilibrium coefficient LD_{s }= 0 or 0.05. We also assumed that the sampling proportions of the cases followed SRS but those of the controls might not. The rest of the parameter values were determined from the values for the variations G^{† },H^{† },D^{† }and DS^{† }given in the tables with the assumption that subpopulation 2 has the maximal baseline G (or H) frequency odds, disease risk, and sampling deviation (this implies that P^{#}(S = 2D = 0) ranges from 0.0585 to o.7163). Finally, we note that in computing the noncentrality parameters, the sample frequencies were replaced by n × P(G = g, H = hD = d). The simulation results for G^{† }= 5 were given in Tables 1 and 2, and those for G^{† }= 3 can be found from Tables S1 and S2 in Additional file 1.
Additional file 1. Biases and the true type I errors of the chisquare tests. The file contains two tables showing the biases and true type I errors of the chisquare tests when G^{† }= 3 and LD = (0,0) or LD = (0,0.5).
Format: DOC Size: 159KB Download file
This file can be viewed with: Microsoft Word Viewer
According to the results in Table 1 the true type I error α_{β }ranges from 0.05 to 0.9998 under linkage equilibrium. If the SRS condition holds and γ = 0, the true type I error α_{β }ranges from 0.05 to 0.9602 with mean 0.4377 and standard error 0.3298. Under the same conditions but γ = 1, the corresponding range becomes (0.05, 0.9326) with mean 0.3822 and standard error 0.2969. On the other hand, if the sampling is not SRS (DS^{† }= 3 or 5) and γ = 0, the range of α_{β }is (0.05, 0.9998) with mean 0.6871 and standard error 0.317. Under nonSRS but γ = 1, the corresponding range becomes (0.05, 0.9992) with mean 0.6291 and standard error 0.3117. These results indicate that the bias can be quite large and its level may be modified by the sample selection and the level of Hmain effect. We also observe that the bias β* may be nonzero under perfect matching. For example, if matching is perfect and Hmain effect γ = 1, the largest true type I error is 0.1064, which occurs at the case with G^{† }= H^{† }= D^{† }= 5. This is contrary to our usual belief that matching between cases and controls in ethnicity can eliminate the PS bias. However, except in some special cases, the bias under perfect matching design are smaller than those under other sampling designs.
Wang et al. [18] suggested that the bias δ* to the interaction effect is small when the linkage disequilibrium coefficient is small and the sampling is SRS. Our Table 1 also shows that under the same condition, the true type I error α_{δ }in testing null interaction ranges from 0.05 to 0.0659. This agrees with their finding. However, if there is selection bias (DS^{† }= 3 or 5), the true type I error rate α_{δ }has range (0.05, 0.2656), mean 0.101, and standard error 0.056 when γ = 0, and range (0.05, 0.2750), mean 0.1053, and standard error 0.0597 when γ = 1. The means and standard errors given here and later were computed based on the results shown in Tables 1 and 2, and Tables S1 and S2 in Additional file 1. These results indicate that PS and SS also can cause serious bias problem in casecontrol study of genegene interactions even when the two genes are in linkage equilibrium. Under this scenario, the best way of reducing the bias is to match cases and controls in ethnicity. We note that under perfect matching and linkage equilibrium, the range of α_{δ }is only between 0.05, and 0.0541.
Linkage disequilibrium between two genes or correlation between genetic and environmental factors play important role in determining the bias level in the studies of interaction. According to results presented in Table 2 we find that the bias to the estimation of the genetic main effect becomes smaller when the linkage disequilibrium coefficient increases from 0 to 0.05. When γ = 0, the mean of α_{β }is 0.3377 under SRS and 0.5514 under nonSRS (selection bias), and when γ = 1 the mean becomes 0.2716 and 0.4597, under SRS and nonSRS, respectively. On the contrary, the bias to the estimation of the interaction effect increases when the linkage disequilibrium coefficient increases from 0 to 0.05. Our results show that when γ = 0, the mean of α_{δ }is 0.1642 under SRS and 0.5512 under nonSRS. When γ = 1, the mean becomes 0.1706 and 0.5555, under SRS and nonSRS, respectively. In all, bias δ* seems to become larger when linkage disequilibrium coefficient gets larger. Under stronger linkage disequilibrium, the true type I error α_{δ }can be as large as 0.1101 even the cases and control were perfectly matched.
An application
Shi et al. [23] studied the interaction effects of maternal smoking and maternal or fetal pharmacogenetic variants on the risk of orofacial cleft based on 1244 subjects from Demark and Iowa, USA with facial clefting and 4183 parents, siblings or unrelated population controls. We considered the combined Denmark and Iowa casecontrol data with H = 1if maternal smoking was yes (0 if no) and G = 1if GSTT1 genotype was null (0, if genotype was notnull); see Table A6 of [23]. Based on these data, we found that G × H interaction was 3.2499 and chisquare test had pvalue equal to 5.5676 × 10^{4}, indicating strong interaction effect. Also, from [24] we found that GSTT1 genotype frequencies of the Caucasian populations were between 0.129 and 0.276, giving the variation of the genotype frequencies G^{† }= 4.8762. The range of maternal smoking rate was between 0.101 and 0.244 (see [2527]), giving the variation of exposure rates H^{† }= 1.968. Since maternal smoking and GSTT1 were independent in the unrelated control population (pvalues of the independence test for the Demark data and Iowa data were respectively equal to 0.0942 and 0.0976), our upper bound for the bias exp(δ*) (see equation 2) equals to 1.6149, leading to the conservative pvalue equal to 2.0353 × 10^{2}. This suggests that the maternal smoking effect on the cleft risk can be modified by the GSTT1 genotype even the population stratification and selection bias are both present in the study.
Discussion
The impact of population stratification is considered by many to be important in casecontrol studies of genedisease association. Many authors have suggested quantitative methods to control type I errors of the usual association test. The most popular treatments include the "genomic control" method [2833] and the "structured association" method [3437]. Each of the proposed methods requires typing extra polymorphic markers to generate an estimate of PS which can be used to adjust the test statistic. The impact of PS in casecontrol studies of genegene (environment) interaction is considered to be less important, when the genes under studied are in linkage equilibrium or when the geneenvironment correlation is weak [18,38]. However, this conclusion holds only when the sampling of the case and control data follow a SRS design, that is no selection bias. Unfortunately, there is no formal method for testing the validity of the SRS condition when the PS is present.
In practical applications, the selection bias is not unusual. For examples, when the hospitalbased cases (controls) are used in the study and they are not representative of the populationbased cases (controls) or when many nonresponse of the cases or/and controls occur in the study or there are selfselections, then the SRS condition may fail. In this paper, we show that under slight selection bias (DS^{† }= 3), the bias to the estimation of main or interaction effect may become unacceptable. Our suggestion is that the bias should be treated seriously, even when the genetic factors are in linkage equilibrium or the genetic and environmental factors are uncorrelated. Large correlation or strong linkage disequilibrium could make the bias become even larger. Also, small variation in disease risk cannot guarantee small bias, unless there is also small selection bias. In applications, it is important to be able to measure the impact of the bias. In this paper, we drive some bounds for the bias. If these bounds are estimable, then they can be used to make conservative inference. We show one real example that a conservative pvalue for testing null interaction can be computed and significance conclusion can be reached even there is bias. Genotype frequencies of the SNPs and their LDs are readily available from international HapMap project. Further, disease prevalence is also available from many nations or from World Health Organization, for example. This information allows us to easily compute bounds and then conservative pvalues.
We note that matching in ethnicity between cases and controls has been suggested by epidemiologists as an affective method to control the PS bias in casecontrol genedisease association study. However, in a more complicated risk model such as the one discussed here, bias (β*) (see equation 1) to the genetic main effect also depends on the effect size of other risk factor. We found that if γ = δ = 0 then the residual bias after matching is small. However, if γ = 1, and δ = 0, the residual bias after matching is still quite substantial. A sufficient condition to assure bias β* = 0 under perfect matching is γ = δ = 0. Tables 1 and 2 also show that matching cannot remove bias to the estimation of the interaction effect.
Since the presence of PS and selection bias may cause unacceptable bias to the usual interaction analysis, it is of importance to have an efficient method to control the bias. Unfortunately, so far there exists no effective method. The major difficulty is that the level of the bias depends on the effect size of other related factor which is in general unknown or not estimable under the PS. However, under some special cases, for example, when the genetic main effects are null (or weak) and testing genegene interaction is the main focus, one may follow the idea of genomic control to type extra pairs of null markers and apply the computed interaction levels to control the bias. In principle, if the candidate markers are in linkage equilibrium, the selected pairs of null markers also need to be in linkage equilibrium so that the important characteristics of the bias can be captured. On the other hand, if the candidate markers are in linkage disequilibrium, the paired null markers also need to be correlated. We are currently working to solve this important problem. Another approach for reducing bias is to match the cases and controls in ethnicity. According to our simulations, we find that under perfect matching and weak linkage disequilibrium, the bias to the estimation of the interaction effect is small. However, more study is needed in order to understand the impact of the residual bias when the matching is not perfect.
Conclusions
In this paper, the biases to the estimation of genetic main and interaction effects are quantified and their bounds are derived. We find that if there is environmental effect or interaction, the bias to the genetic main effect cannot be ignored even cases and controls were matched in ethnicity. The bias to the estimation of interaction effect also has the same problem. The estimated bound can be used to compute conservative pvalue for the association test. The computation of conservative pvalue does not require the knowledge on the number of subpopulations involved in the study or the membership of each study subject. In real applications, it is usually not clear that if there is PS or selection bias or both. However, if appropriate information such as the variation of genotype frequencies is known, we always can compute the conservative pvalue. If the conservative pvalue is smaller than the designated significance level, we can safely claim that the test is significant regardless of the presence of PS/nonSRS.
Methods
Following the usual Bayesian argument, the diseaserisk model implies that
where , s = 2,..., k. As a consequence,
On the other hand, the joint frequency distribution of G and H in the control population is given by
Thus their ratio is given by
Here, we define μ* = log{K^{Δ}(0,0)},,and , where Note that the above results are derived using the expression of
Also note that we can express
Define
and
Simple algebra shows that there exists some constant w* such that the bias is bounded above by
Here G_{M}(G_{m}) is the largest value of G_{s}.D_{M}, D_{m}, DS_{M}, and DS_{m }are similarly defined. Also note that under SRS, DS_{s }= 1 and therefore according to the definition of exp(δ*)we easily show that it is bounded above by (D^{† })^{2 }and bounded below by (D^{† })^{2}. However, under general sampling design, the bias is expressed as
where and . By applying the same approach for deriving bounds for exp(β*), we also can derive bounds for exp(δ*).
Authors' contributions
KFC designed the study, performed the analysis and wrote the paper. JYL performed the Computation and helped in discussion. All authors read and approved the final manuscript.
Acknowledgements
This research was supported in part by a grand from National Science Council and a joint research grand from China Medical University and Asia University. The authors are grateful to the discussion of JinHua Chen and would like to thank two reviewers for their comments which greatly improve the presentation of this paper.
References

Marcus PM, Hayes RB, Vineis P, GarciaClosas M, et al.: Cigarette smoking, Nacetyltransferase 2 acetylation status, and bladder cancer risk: a caseseries meta analysis of geneenvironment interaction.
Cancer Epidemiol Biomarkers Prev 2000, 9:461467. PubMed Abstract  Publisher Full Text

Han J, Hankinson SE, Colditz GA, Hunter DJ: Genetic variation in XRCC1, sun exposure, and risk of skin cancer.
Br J Cancer 2004, 91:16041609. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Cox N, Frigge M, Nicolae DL, Concannon P, Hanis CL, Bell GI, Kong A: Loci on chromosomes 2 (NIDDM1) and 15 interact to increase susceptibility to diabetes in Mexican Americans.
Nat Genet 1999, 21:213215. PubMed Abstract  Publisher Full Text

Cordell HJ, Todd JA, Bernnett ST, Kawaguishi Y, Ferrell M: Twolocus maximum LOD score analysis of multifactorial trait: 7 joint consideration of IDDM2 and IDDM4 with DDM1 in type 1 diabetes.
Am J Hum Genet 1995, 57:920934. PubMed Abstract  PubMed Central Full Text

Cho JH, Nicolae DL, Gold LH, Fields CT, LaBuda MC, Rohal PM, et al.: Identification of novel susceptibility loci for inflammatory bowel disease on chromosomes 1p, 3q, and 4q: evidence for epistasis between 1p and IBD1.
Proc Nat Acad Sci USA 1998, 95:75027507. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Xu J, Langefeld CD, Zheng SL, Gillanders EM, Chang BL, Issacs SD, et al.: Interaction effect of PTEN and CDKNIB chromosomal regions on prostate cancer linkage.
Hum genet 2004, 115:255262. PubMed Abstract  Publisher Full Text

Aston CE, Ralph DA, Lalo DP, Manjeshwar S, Gramling BA, DeFreese DC, et al.: Oligogenetic combinations associated with breast cancer risk in women under 53 years of age.
Hum Genet 2005, 116:208221. PubMed Abstract  Publisher Full Text

Cordell HJ: Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans.
Hum Mol genet 2002, 11:24632468. PubMed Abstract  Publisher Full Text

Wang S, Zhao H: Sample size needed to detect genegene interactions using association designs.
Am J Epidemiol 2003, 158:899914. PubMed Abstract  Publisher Full Text

Schaid DJ: Caseparents design for geneenvironment interactions.
Genet Epidemiol 1999, 16:261273. PubMed Abstract  Publisher Full Text

Lander ES, Schork NJ: Genetic dissection of complex traits.
Science (Wash. DC) 1994, 265:20372048. Publisher Full Text

Ewens WJ, Spielman RS: The transmission/disequilibrium test: history, subdivision, and admixture.
Am J Hum Genet 1995, 57:455464. PubMed Abstract  PubMed Central Full Text

Altschuler D, Kruglyak L, Lander ES: Genetic polymorphism and disease.
N Engl J Med 1998, 338:1626. PubMed Abstract

Witte JS, Gauderman WJ, Thomas DC: Population stratification in association studies.

Khoury MJ, Beaty TH: Applications of the casecontrol method in genetic epidemiology.
Epidemiol Rev 1994, 16:134150. PubMed Abstract  Publisher Full Text

Khoury MJ, Yang Q: The future of genetic studies of complex human diseases: an epidemiologic perspective.
Epidemiology 1998, 9:350354. PubMed Abstract  Publisher Full Text

Cheng KF, Lin WJ: Simultaneously correcting for population stratification and for genotyping error in casecontrol association studies.
Am J Hum Genet 2007, 81:726743. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Wang Y, Localio R, Rebbeck TR: Evaluating bias due to population stratification in epidemiologic studies of genegene or geneenvironment interactions.
Cancer Epidemiol Biomark Prev 2006, 15:124132. Publisher Full Text

Price AL, Patterson NJ, Plenge RM, et al.: Principle components analysis corrects for stratification in genomewide association studies.
Nat Genet 2006, 38:904909. PubMed Abstract  Publisher Full Text

Lee WC, Wang LY: Simple formulas for gauging the potential impacts of population stratification bias.
Am J Epidemiol 2008, 167:8689. PubMed Abstract  Publisher Full Text

Satten GA, Flanders WD, Yang Q: Accounting for unmeasured population substructure in casecontrol studies of genetic association using a novel latentclass model.
Am J Hum Genet 2001, 68:466477. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Cheng KF, Lee JY, Chen JH: Studying the joint effects of population stratification and sampling in casecontrol association studies.
Hum Hered 2010, 69:254261. PubMed Abstract  Publisher Full Text

Shi M, Christensen K, Weinberg CR, et al.: Orofacial cleft risk is increased with maternal smoking and specific detoxificationgene variants.
Am J Hum Genet 2007, 80:7690. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Garte S, Gaspari L, Alexandrie AK, et al.: Metabolic gene polymorphism frequencies in control populations.
Cancer Epidemiol Biomark & Prev 2001, 10:12391248. PubMed Abstract  Publisher Full Text

HellstromLindahl E, Nordberg A: Smoking during pregnancy: a way to transfer the addiction to the next generation?
Respiration 2002, 69:289293. PubMed Abstract  Publisher Full Text

Cnattingius S: The epidemiology of smoking during pregnancy: smoking prevalence, maternal characteristics and pregnancy outcomes.
Nicotine & Tobacco research 2004, 6:S125S140. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Department of Health and Human Services Centers for Disease Controls and Prevention: Smoking during pregnancyUnited States, 19902002.
MMWR 2004, 53(39):911915. PubMed Abstract  Publisher Full Text

Devlin B, Roeder K: Genomic control for association studies.
Biometrics 1999, 55:9971004. PubMed Abstract  Publisher Full Text

Bacanu SA, Devlin B, Roeder K: The power of genomic control.
Am J Hum Genet 2000, 66:19331944. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Devlin B, Roeder K, Wasserman L: Genomic control, a new approach to geneticbased association studies.
Theor Popul Biol 2001, 60:155166. PubMed Abstract  Publisher Full Text

Devlin B, Roeder K, Bacanu SA: Unbiased methods for population based association studies.
Genet Epidemiol 2001, 21:273284. PubMed Abstract  Publisher Full Text

Bacanu SA, Devlin B, Roeder K: Association studies for quantitative traits in structured populations.
Genet Epidemiol 2002, 22:7893. PubMed Abstract  Publisher Full Text

Campbell CD, Ogburn EL, Lunetta KL, Lyon HN, Freedman ML, Groop LC, Altshuler D, Ardlie KG, Hirschhorn JN: Demonstrating stratification in a European American population.
Nat Genet 2005, 37:868872. PubMed Abstract  Publisher Full Text

Pritchard JK, Donnelly P: Casecontrol studies of association in structured or admixed populations.
Theor Popul Biol 2001, 60:227237. PubMed Abstract  Publisher Full Text

Pritchard JK, Stephens M, Donnelly P: Inference of population structure using multilocus genotype data.
Genetics 2000, 155:945959. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Pritchard JK, Stephens M, Rosenberg NA, Donnelly P: Association mapping in structured populations.
Am J Hum Genet 2000, 67:170181. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Satten GA, Flanders WD, Yang Q: Accounting for unmeasured population substructure in casecontrol studies of genetic association using a novel latentclass model.
Am J Hum Genet 2001, 68:466477. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Wang LY, Lee WC: Population stratification bias in the caseonly study for geneenvironment interactions.
Am J Epidemiol 2008, 168:197201. PubMed Abstract  Publisher Full Text