Certain loci on the human genome, such as glutathione S-transferase M1 (GSTM1), do not permit heterozygotes to be reliably determined by commonly used methods. Association of such a locus with a disease is therefore generally tested with a case-control design. When subjects have already been ascertained in a case-parent design however, the question arises as to whether the data can still be used to test disease association at such a locus.
A likelihood ratio test was constructed that can be used with a case-parents design but has somewhat less power than a Pearson's chi-squared test that uses a case-control design. The test is illustrated on a novel dataset showing a genotype relative risk near 2 for the homozygous GSTM1 deletion genotype and autism.
Although the case-control design will remain the mainstay for a locus with a deletion, the likelihood ratio test will be useful for such a locus analyzed as part of a larger case-parent study design. The likelihood ratio test has the advantage that it can incorporate complete and incomplete case-parent trios as well as independent cases and controls. Both analyses support (p = 0.046 for the proposed test, p = 0.028 for the case-control analysis) an association of the homozygous GSTM1 deletion genotype with autism.
Despite technological advances, not all loci in the human genome can readily be fully genotyped using current conventional methods. Incomplete sequence information, unknown splice junction, unknown size of the deletion, and a large amount of homology with nearby sequence can all contribute to such a problem. The GSTM1 locus can be considered a model of such a locus. Heterozygotes involving the GSTM1 deletion, or null, allele cannot be detected using standard genotyping methods . In such a case, the investigator can determine genotype only up to homozygous-deletion/not-homozygous-deletion categorization, serving as a reminder that what we label "genotype" in our data analysis is actually an observed phenotype. Studies involving such loci generally use a case-control contingency table analysis with two categories for genotype.
Contemporary research often uses a family-based association study design in examining a number of loci at once. The question naturally arises as to whether case-parent trio DNA be used to advantage over the case-control contingency table analysis at a locus where heterozygotes cannot be reliably distinguished from one of the homozygote genotypes. This note examines a likelihood ratio test built on possible mating types for a case-parent design. Unlike the well-known transmission disequilbrium test for case-parent trios, the test discussed here does require allele frequency estimates and so is susceptible to population stratification and admixture effects much as a case control analysis is. With that proviso, we examine the performance of the proposed test in simulations, and in a new dataset involving the GSTM1 deletion allele and the autism phenotype.
Autism and the GSTM1 locus
Autism (autistic disorder) is a pervasive developmental disorder with diagnostic criteria based on abnormal social interactions, language abnormalities, and stereotypies evident prior to 36 months of age . Despite its lack of Mendelian transmission, autism is highly genetically determined [3,4].
The vast majority of cases of autism are unrelated to known teratogens but the phenotypic expression of autism may be affected by the interaction of environmental factors with multiple gene loci. There is evidence supporting a role for oxidative stress in autism [5,6]. Oxidative stress could interact with common functional polymorphic variants of genes that protect against oxidative stress and could thus affect brain development during gestation or possibly after gestation, contributing to expression of autism. Glutathione (GSH) is the most important endogenous antioxidant due to its ability to bind electrophilic substrates through its free sulfhydryl group  and is the most abundant non-protein thiol, occurring in millimolar concentrations in human tissues . Low plasma total GSH (tGSH) levels, elevated levels of oxidized GSH (GSSG) and low ratios of tGSH to GSSG have been reported in autism .
Glutathione-S-transferases (GSTs), are an important class of antioxidant enzymes that catalyze conjugation of GSH to toxic electrophiles. GSTs are abundant, accounting for up to 10% of cellular protein . Some genetic polymorphisms of GSTs are known to affect enzyme function. It is possible that a functional GST polymorphism could contribute to the pathogenesis of autism, an effect that could be potentiated by reduced levels of GSH, one of the substrates of GSTs. GSTs are Phase II enzymes that conjugate GSH to activated toxins, xenobiotics and metabolites including products of Phase I enzymes such as cytochrome P450 oxidases.
Polymorphic alleles of GSTs have been reported to contribute to a number of human diseases. We focused on the GSTM1*0 polymorphism because the variant allele is a complete gene deletion that lacks function of the GSTM1 enzyme. Homozygosity for GSTM1*0 was reportedly associated with an increased risk of prostate cancer in the presence of either the val/val or the ile/val genotypes of the Phase I enzyme CYP1A1 . Homozygosity for GSTM1*0 was associated with increased risk of bladder cancer . GSTM1*0 contributed to risk of hepatocellular cancer in conjunction with environmental factors . GSTM1*0 contributed to breast cancer risk in conjunction with the val/val genotype of the Phase I enzyme, CYP1A1 . GSTM1*0 also contributed to the risk of small cell lung cancer  and asthma . GSTM1 is located on 1pl3.3. At least three reports [17-19] show some evidence of genetic linkage of autism to the region; we are not aware of any genetic association studies of autism in this region.
Likelihood ratio test
For a given bi-allelic locus, there are 15 possible triplets of genotypes for the father-mother-child trios [20,21]. The left half of Table 1 shows these triplets, expressed in terms of the number of full alleles each trio-member has. The table also shows the population frequency of each triplet under Hardy-Weinberg equilibrium (HWE) in the parents, as well as the sampling frequencies under the assumptions that each child is a case and that the relative risk of zero copies (one copy) of the full allele for the disorder in question is r0 (r1). The right hand side of the table gives the same information when 2 copies of the full allele cannot be distinguished from 1 copy; 1 or 2 copies are denoted P (for present) and 0 copies are denoted D (for deletion). It does not appear to be possible to test for Hardy-Weinberg equilibrium in this situation.
Table 1. Population and Sampling Frequencies of Case-Parent Trios. "F, M, C" refers to Father, Mother, Child.
Case-parent trios can be categorized into one of the 7 types on the right of the table. The resulting counts will follow a multinomial distribution with cell probabilities as given in the table. One can then construct a likelihood under a model with
1. r1 = r0 = 1,
2. r1 = 1 but r0 unconstrained, or
3. r0 and r1 both unconstrained.
Model 2 might correspond to a scientific hypothesis that either one or two copies of the full allele provides the same biological functionality, while model 3 might correspond biologically to a dose-response model (although r1 is not constrained to lie between 1 and r0). Other models, such as r1 = or r1 = r0 are also possible. The likelihood ratio test has a test statistic equal to twice the difference in the maximized log-likelihoods of the relevant models. Asymptotically that test statistic is distributed as a chi-squared random variable with degrees of freedom equal to the number of additional parameters estimated, namely 1 for the second model versus the first or the third versus the second, and two for the third versus the first. In all models, p, the frequency of the full allele, will be estimated.
Under Model 2, the maximum likelihood estimator of r0 is simply
where m is the total number of cases with the full allele present, n is the total number of cases homozygous for the null allele, and = 1 - is the estimated frequency of the null allele. The estimator is thus simply the observed ratio of the two detectable genotypes among the cases divided by the ratio expected under the null hypothesis.
When both r0 and r1 are estimated, the maximum likelihood estimators are
For all three models, p can be estimated as the solution to a quadratic or cubic equation, although in case (3) there is a particularly simple form of = (2b + d + f)/(2n), where b, d, and f are as in the mating type table and represent the counts in cells with non-obligate null homozygous cases.
The discussion above applies when the data consists only of completely case-parent trios, but the test can easily accommodate data on cases with a single genotyped parent, cases with no parental genotypes, and controls. Control subjects, in particular, will yield more accurate allele frequency estimates. With the additional subject types, the likelihood factors into a complete trio term, an incomplete trio term, a case-only term, and a control-only term. For cases with incomplete parental genotyping, the cells in Table 1 are simply collapsed over the parent's unknown genotypes. For example, when the mother's genotype is unknown, the probability of the father and child both having the "Present" allele is simply the sum of the probabilities of the (P, P, P) and (P, D, P) types (cells labeled a and c in the table). With the greater variety of data types, the maximum likelihood estimators no longer have closed forms. The likelihood, however, remains straightforward. Under each model and for each data type, the probability of an observation belonging to a particular cell is a function of p, r0, and r1. The overall likelihood is a product of the likelihoods for each data type. The likelihood can then be maximized using standard numerical techniques. Code for the R statistical environment  containing functions for calculating test statistics, estimates, and confidence intervals is available [see 1].
Format: R Size: 7KB Download file
The cohort (70 nuclear families) for the autism association study was ascertained through the New Jersey Center for Outreach and Services for the Autism Community (COSAC) and the UMDNJ-RWJMS Department of Pediatrics-Division of Neurodevelopmental Disabilities. Each affected individual had diagnosis by the Autism Diagnostic Interview-Revised (ADI-R) and the Autism Diagnostic Observation Schedule-Generic (ADOS-G) [23,24]. Blood samples were drawn from members of autism families and from unrelated, unaffected controls ascertained from UMDNJ clinics and individuals married into dominant pedigrees of other disorders. This study was approved by the Institutional Review Board of UMDNJ-Robert Wood Johnson Medical School and UMDNJ-New Jersey Medical School.
Genotyping of the GSTM1*0 whole gene deletion polymorphism was carried out by the method of Yang et al.  with specific primers using a PCR method with the beta-globin gene amplified as a positive control for PCR efficiency. PCR products were separated on polyacrylamide gels and visualized with ethidium bromide. The GSTM1 product was about 200 bp and the beta-globin product was about 250 bp. In the presence of a positive betaglobin band, the absence of the GSTM1 band was interpreted as homozygosity for the whole gene deletion allele .
To study the power of the likelihood ratio tests compared to the usual case control contingency table analysis, we performed a number of simulations, the results of which are shown in Tables 2 and 3. Each cell in the tables represents 10,000 runs. The simulations vary the deletion allele frequency q (so the observed homozygous deletion genotype frequency is q2), the relative risks r0 and r1 for zero or one copies of the full allele as compared to the risk for the genotype homozygous for the full allele, and the number of trios (either 50 or 200). All simulations use a prevalence of 0.001 for the disorder. For the case control simulations there were twice the number of controls as cases, so that each test involved the same amount of genotyping. The test statistic for the case control simulations was the Pearson chi-squared statistic without continuity correction. Other contingency table test statistics give very similar results (data not shown). Controls were not used for the likelihood ratio tests in these Tables 2 and 3, but Table 4 shows a selection of results when the controls are used in the likelihood ratio tests. The table shows the case just for r0 = 2, but the general pattern holds for other values of r0 (results not shown). Table 4 illustrates that the 1-df likelihood ratio test utilizing the controls data has slightly more power than the contingency table analysis under a recessive model and slightly less power under the multiplicative model. Of course, using 2 controls for each case represents 5/3 as much genotyping for the likelihood ratio tests as for the contingency table analyses. The table therefore also includes the power when the case:control ratio is 1:4, so that the total genotyping is the same as for the likelihood ratio tests. Not surprisingly, this design generally has greatest power except under the dominant genetic model.
Table 2. Power with 50 Case-Parent Trios and 100 Controls. CC = Pearson's Chi-Squared Test, LR 1 = likelihood ratio test with 1 df, LR 2 = likelihood ratio test with 2 df. Controls were used only for the Chi-Squared test.
Table 3. Power with 200 Case-Parent Trios and 400 Controls. CC = Pearson's Chi-Squared Test, LR 1 = likelihood ratio test with 1 df, LR 2 = likelihood ratio test with 2 df. Controls were used only for the Chi-Squared test.
Table 4. Power Using Controls in All Tests. CCa = Pearson's Chi-Square with twice as many controls as cases. CCb = Pearson's Chi-Square with four times as many controls as cases. LR 1 = likelihood ratio test with 1 df, LR 2 = likelihood ratio test with 2 df. Both likelihood tests used twice as many controls as cases.
To examine the effect of incomplete parental genotypes, we also performed power analyses with some complete case-parent trios replaced with trios with only one parent genotyped (data not shown). When the number of subjects genotyped was held constant (i.e., n completely genotyped trios replaced with 3n/2 one-parent-genotyped trios), the power differed by only a few percentage points. This result indicates that a case-parent trio with one parent genotyped carries roughly 2/3 of the information of a complete trio. It may well be the case that methods that can distinguish heterozygotes from both homozgyotes are available, but are more expensive than methods that give partial information. To examine how much information is lost by partial genotyping, we calculated the relative efficiency, in terms of sample sizes, of using partial genotyping versus fully-informative genotyping. Table 5 shows the percentage relative efficiency for recessive and multiplicative genetic models. We do not show the comparison for the dominant model, as the power performance of the proposed test with partial genotyping is so poor. The first pair of columns show the efficiency of the 1-df proposed test with partial genotyping compared to the TDT test with fully informative genotyping. The TDT is known to perform poorly under a recessive generating model, so the second pair of columns compares the 1-df proposed test with Schaid's 2-df likelihood ratio test with full genotyping . Schaid's test is robust across many genetic models. The last pair of columns shows Schaid's 2-df test compared with the proposed 2-df test. The power of the TDT and Schaid's test was calculated using Knapp's and Schaid's methods [26,27] and compared with the results in Table 3.
Table 5. Relative Efficiency of Partial Genotyping versus Fully Informative Genotyping. The table gives the percentage relative efficiency of partial (homozygous deletion versus other) genotyping with the proposed test compared to full genotyping. Numbers less than 100% indicate the proposed test with partial genotyping is less efficient than the standard test with full genotyping. LR 1 and LR 2 refer to the proposed likelihood ratio tests with 1- and 2-df, respectively. TDT refers to the transmission-disequilibrium test. Schaid LRGen refers to Schaid's 2-df likelihood ratio test . Calculations are based on Table 3.
Autism and the GSTM1 deletion allele
The allele frequencies of GSTM1 are known to vary with the population. For this analysis, the study sample was restricted to the largest racial and ethnic group, namely those self-identifying as Non-Hispanic White. The published homozygous deletion genotype frequency in this population is about 0.5 , suggesting a deletion allele frequency q of about 0.7. The final sample reported here consists of 54 complete case-parent trios and 172 controls. Of the cases, 45 were diagnosed with autistic disorder on both the ADI-R and ADOS-G, while 9 were diagnosed with pervasive developmental disorder not otherwise specified on one instrument but autistic disorder on the other.
The observed genotypes are shown in Table 6. The chi-squared test statistics are 4.83 for Pearson's, 3.98 for the 1-df LRT, and 3.98 for the 2-df LRT (based on the next section, the 2-df LRT would not be recommended in this situation, but is included here for completeness), giving p-values of 0.028, 0.046, and 0.137, respectively, with controls included in all tests. The genotype relative risk estimates are = 1.85 for the 1-df test, = 1.76 and = 0.94 for the 2-df test. Estimates of q are 0.73 under model (1) and 0.71 under both models (2) and (3). When controls were not used in the likelihood ratio tests, the chi-squared values were 0.80 and 1.31 for the 1- and 2-df tests, respectively, giving p-values of 0.371 and 0.521. The results for the case-control analysis and the 1-df likelihood ratio test (utilizing controls) are repeated in Table 7.
Table 6. Observed Genotypes in GSTM1 and Autism Association Study.
Table 7. Results for GSTM1 and Autism Association Study. "Pearson" refers to Pearson's chi-square analysis of the case-control data. "Likelihood Ratio Test" refers to the 1-df test discussed in the text, in this case using the full information in Table 6. OR = Odds Ratio, RR = Relative Risk of homozygous deletion genotype relative (r1 in the text).
The simulations show that the 1-df likelihood ratio test has somewhat less power than the case control approach under a recessive genetic model (r1 = 1) and much less power under an multiplicative model (r1 = ). None of the tests performed well under a dominant model (r1 = r0), but with a deletion allele, likely to result in a loss of function, this model seems less likely on biological grounds. It could, however, arise when partial loss of function reduces the gene product below a functional threshold. The 2-df likelihood ratio test was slightly less powerful than the 1-df test for the multiplicative model and considerably more powerful under a dominant model. It is much less powerful than the 1-df test under the recessive model, which, of course, is the genetic model for which the 1-df test model is correct. All of the tests have low power under a dominant model. If this situation is suspected one the expensive of fully informative genotyping followed by a standard test may be worthwhile. If the use of the proposed tests can be avoided when biology suggests a dominant risk model holds, the 2-df test does not appear to hold any power advantage over the 1-df test.
An advantage of a likelihood-based test is that variants can easily be incorporated. Data from complete case-parent trios, incomplete trios, individual cases, and controls can all be used in the tests described here. If full genotypes distinguishing heterzygotes were available on some study participants the likelihood could be modified to incorporate them. The likelihood can also be modified for testing specific genetic models. One could even potentially incorporate parent-of-origin effects as has been done by Weinberg et al. for fully genotyped trios .
An important weakness of the proposed test is its reliance on Hardy-Weinberg equilibrium among parents. The test is designed for the situation where heterozygotes cannot be distinguished from one of the homozygotes, a situation where Hardy-Weinberg equilibrium cannot be tested. It does not appear that this weakness can be overcome by statistical methods. However, the most common causes of the failure of HWE is likely to be genotyping error or population stratification. The case-control method is vulnerable to these effects as well, so this weakness of the proposed test is no worse than that of the existing method.
Autism and the GSTM1 deletion allele
The full data available, namely case-parent trios along with controls, gives evidence of a heightened risk for autism for GSTM1*0 homozygotes. The population frequency of that genotype is large, but the genotype is presumably interacting with other genetic and environmental risk factors. Absence of the GSTM1 gene in GSTM1*0 homozygotes could lead to failure of individuals with autism to detoxify important compounds, including some that could be agents or products of oxidative stress.
Further studies are needed to confirm these observations. The present findings could be consistent with the hypothesis of a gene-environment interaction that alters the expression of autism because GSTs are detoxification enzymes that conjugate absorbed xenobiotics. These findings could lead to documentation and identification of an exogenous or endogenous moiety interacting with GSTs to contribute to autism and a mechanism of action of select environmental chemicals in contributing to the phenotypic presentation of autism.
As researchers increasingly study larger sets of candidate loci at a time, they will occasionally find that their study design may not be best for a specific locus. While a case-parent design offers many advantages at most loci, it has not generally been considered possible to use such a design to test a locus where the heterozygote cannot be reliably detected. We have demonstrated that, with the risk of the additional assumption of Hardy-Weinberg equilibrium, it is possible to construct such a test. For the same number of genotyped subjects, the resulting test has less power than a Pearson's chi-squared test using cases and controls. If controls can be added, the proposed test has slightly more power, but at a cost of additional genotyping; if that genotyping were instead dedicated to additional controls, the case-control analysis would maintain its superiority in power. The 2-df test appears to be most useful only when a dominant model for the deletion allele is suspected, but would require a large sample in that circumstance. The 1-df test, however, is more generally worthwhile when the study participants have already been assembled. It has the advantage that it can be used with complete and incomplete trios as well as independent cases and controls With respect to the association study of the GSTM1 locus with autism, both the traditional case-control analysis and the 1-df likelihood ratio test (utilizing controls) support (at p = 0.028 and p = 0.046, respectively) the association of the homozygous GSTM1 deletion genotype with an increased risk of autism. There is no evidence that the heterozygous genotype contributes to any increased risk.
SB developed the statistical method, performed the data analysis, and drafted the manuscript. TAW assisted with accrual, performed molecular genetic work, managed the data, and helped to draft the manuscript. AEM diagnosed cases and assisted with accrual. ESS performed molecular genetic work and data management. SXM, MFF, CR, and GHL assisted with accrual. RW and MS performed molecular genetic work. WGJ conceived and designed the GSTM1 and autism study and helped to draft the manuscript.
The authors are grateful for the support of the NIH (1K25AA015346, R21-44244, ES11256 and U24MH068457), US EPA (R829391), The New Jersey Governor's Council, the National Alliance for Autism Research, The New Jersey Center for Outreach and Services for the Autism Community (COSAC, Ewing, NJ), and The NIH/USEPA Center for Childhood Neurotoxicology and Exposure Assessment (NIH ES1 1256 and USEPA R829391). The authors are also grateful to the anonymous reviewers for their help in improving the article.
Pharmacogenetics 1996, 6(2):187-191. PubMed Abstract
Veenstra-Vanderweele J, Christian SL, Cook EH: Autism as a paradigmatic complex genetic disorder. [http://dx.doi.org/10.1146/annurev.genom.5.061903.180050] webcite
Söğüt S, Zoroğlu SS, Ozyurt H, Yilmaz HR, Ozuğurlu F, Sivasli E, Yetkin O, Yanik M, Tutkun H, Savaş HA, Tarakçioğlu M, Akyol O: Changes in nitric oxide levels and antioxidant enzyme activities may have a role in the pathophysiological mechanisms involved in autism.
Hung RJ, Boffetta P, Brennan P, Malaveille C, Hautefeuille A, Donato F, Gelatti U, Spaliviero M, Placidi D, Carta A, di Carlo AS, Porru S: GST, NAT, SULT1A1, CYP1B1 genetic polymorphisms, interactions with environmental exposures and bladder cancer risk in a high-risk population. [http://dx.doi.org/10.1002/ijc.20157] webcite
Kirk GD, Turner PC, Gong Y, Lesi OA, Mendy M, Goedert JJ, Hall AJ, Whittle H, Hainaut P, Montesano R, Wild CP: Hepatocellular carcinoma and polymorphisms in carcinogen-metabolizing and DNA repair enzymes in a population with aflatoxin exposure and hepatitis B virus endemicity.
Chacko P, Joseph T, Mathew BS, Rajan B, Pillai MR: Role of xenobiotic metabolizing gene polymorphisms in breast cancer susceptibility and treatment outcome. [http://dx.doi.org/10.1016/j.mrgentox.2004.11.018] webcite
Tamer L, Calikoğlu M, Ates NA, Yildirim H, Ercan B, Saritas E, Unlü A, Atik U: Glutathione-S-transferase gene polymorphisms (GSTT1, GSTM1, GSTP1) as increased risk factors for asthma. [http://dx.doi.org/10.1111/j.1440-1843.2004.00657.x] webcite
Auranen M, Nieminen T, Majuri S, Vanhala R, Peltonen L, Järvelä I: Analysis of autism susceptibility gene loci on chromosomes 1p, 4p, 6q, 7q, 13q, 15q, 16p, 17q, 19q and 22q in Finnish multiplex families.
Ylisaukko-Oja T, Alarcón M, Cantor RM, Auranen M, Vanhala R, Kempas E, von Wendt L, Järvelä I, Geschwind DH, Peltonen L: Search for autism loci by combined analysis of Autism Genetic Resource Exchange and Finnish families. [http://dx.doi.org/10.1002/ana.20722] webcite
Am J Hum Genet 1993, 53(5):1114-1126. PubMed Abstract
Weinberg CR, Wilcox AJ, Lie RT: A log-linear approach to case-parent-triad data: assessing effects of disease genes that act either directly or through maternal effects and that may be subject to parental imprinting.
R Foundation for Statistical Computing, Vienna, Austria; 2005.
Lord C, Risi S, Lambrecht L, Cook EH, Leventhal BL, DiLavore PC, Pickles A, Rutter M: The Autism Diagnostic Observation Schedule-Generic: a standard measure of social and communication deficits associated with the spectrum of autism.
Yang P, Yokomizo A, Tazelaar HD, Marks RS, Lesnick TG, Miller DL, Sloan JA, Edell ES, Meyer RL, Jett J, Liu W: Genetic determinants of lung cancer short-term survival: the role of glutathione-related genes.
Garte S, Gaspari L, Alexandrie AK, Ambrosone C, Autrup H, Autrup JL, Baranova H, Bathum L, Benhamou S, Boffetta P, Bouchardy C, Breskvar K, Brockmoller J, Cascorbi I, Clapper ML, Coutelle C, Daly A, Dell'Omo M, Dolzan V, Dresler CM, Fryer A, Haugen A, Hein DW, Hildesheim A, Hirvonen A, Hsieh LL, Ingelman-Sundberg M, Kalina I, Kang D, Kihara M, Kiyohara C, Kremers P, Lazarus P, Marchand LL, Lechner MC, van Lieshout EM, London S, Manni JJ, Maugard CM, Morita S, Nazar-Stewart V, Noda K, Oda Y, Parl FF, Pastorelli R, Persson I, Peters WH, Rannug A, Rebbeck T, Risch A, Roelandt L, Romkes M, Ryberg D, Salagovic J, Schoket B, Seidegard J, Shields PG, Sim E, Sinnet D, Strange RC, Stcker I, Sugimura H, To-Figueras J, Vineis P, Yu MC, Taioli E: Metabolic gene polymorphism frequencies in control populations.