Genotyping has become more cost-effective and less invasive with the use of buccal cell sampling. However, low or fragmented DNA yields from buccal cells collected using FTA cards often requires additional whole genome amplification to produce sufficient DNA for genotyping. In our case–control study of childhood leukaemia, discordance was found between genotypes derived from blood and whole genome amplified FTA buccal DNA samples. We aimed to develop a user-friendly method to correct for this genotype misclassification, as existing methods were not suitable for use in our study.
Discordance between the results of blood and buccal-derived DNA was assessed in childhood leukaemia cases who had both blood and FTA buccal samples. A method based on applying misclassification probabilities to measured data and combining results using multiple imputations, was devised to correct for error in the genotypes of control subjects, for whom only buccal samples were available, to minimize bias in the odds ratios in the case–control analysis.
Application of the correction method to synthetic datasets showed it was effective in producing correct odds ratios from data with known misclassification. Moreover, when applied to each of six bi-allelic loci, correction altered the odds ratios in the logically anticipated manner given the degree and direction of the misclassification revealed by the investigations in cases. The precision of the effect estimates decreased with decreasing size of the misclassification data set.
Bias arising from differential genotype misclassification can be reduced by correcting results using this method whenever data on concordance of genotyping results with those from a different and probably better DNA source are available.
Keywords:Biostatistics; DNA; Genotype; Measurement error; Quality control; Whole genome amplification
The Australian Study of causes of Acute Lymphoblastic Leukaemia in Children (Aus-ALL) was a population-based case–control study, conducted between 2003 and 2007, designed to investigate environmental and genetic risk factors for childhood ALL as well as their interactions. The study has been described in detail elsewhere . Briefly, 415 case children provided blood samples for genetic analysis during a routine visit to the treating hospital after initial remission was achieved. Buccal cell samples were collected from 536 control children at home using Whatman FTA Indicating Micro Cards (Cat. No.WB120211) (hereafter referred to as ‘FTA cards’) to maximize participation and minimize costs [2,3]. Whole genome amplification (WGA) from 1.2 mm diameter trephine punched discs were used to increase and preserve the finite amount of DNA available from the FTA card samples. Case children were also asked to provide a buccal sample using an FTA card so that concordance between genotypes measured using blood and FTA card buccal samples could be examined.
In this paper, we briefly present the concordance results for a set of six polymorphisms and describe the performance of a method we developed to correct genetic association analyses for the unexpected misclassification we observed. Because of the differential nature of the misclassification (all controls provided buccal DNA while most cases provided a blood sample) and the large number of covariates required and analyses planned, particularly all those requiring more than one gene in a model or examination of gene-environment or gene-gene interactions, we needed a comparatively simple and easy-to-use method for addressing this issue. While there is a considerable literature on methods for correction of measurement error or misclassification (for example, Guolo 2008 , and Thurigen et al., 2000 ), there appeared to be none that met our requirements with easily useable software. For example, the seismic program from Fox et al.  only handled one binary misclassified variable, and the Mime approach of Cole et al.  handled only one binary misclassified variable and required some of the validation sample to also include controls. The more flexible MC-SIMEX method of Kuchenhoff et al. , available in the R-library, was also not considered to be suitable, as the corrected results took no account of the size of the validation sample and was highly dependent on the choice of the extrapolation function. Clayton et al. have described an ingenious method for adjusting for a similar problem of differential bias , which uses statistical properties of the observed data and requires no validation data; however, it requires large samples so could not be used in our study. Thus it was necessary to develop a novel correction method, as described below.
The blood samples from case children were couriered to the processing laboratory overnight following collection. Whole blood was refrigerated at 4°C for a maximum of 7 days prior to DNA extraction using the Wizard Genomic DNA Purification Kit (Promega, Madison WI cat #1620) in accordance with the manufacturer’s instructions. DNA concentrations were quantified using a ND-1000 spectrophotometer (Nanodrop Technologies, Wilmington DE). FTA cards were stored at room temperature with desiccant until processing. Four discs of 1.2 mm diameter were trephine punched from the sampling area and placed collectively into a single tube and amplified using the GenomiPhi DNA Amplification Kit (GE Healthcare, Buckinghamshire UK). Briefly, 9μL of sample buffer, 9μL of reaction buffer and 1μL of enzyme mix were added to the punches with thermo cycling conditions according to the manufacturer’s instructions. DNA concentrations were quantified using a ND-3300 fluorospectrometer. The quantified DNA aliquots were then frozen at minus 30 degrees and thawed when required for genotyping.
Genotyping for single nucleotide polymorphisms (SNPs) involved in folate-metabolism, xenobiotic biotransformation and DNA repair pathways was performed using either restriction fragment length polymorphism analysis or TaqMan® SNP Genotyping Assays (Applied Bios stems, Foster City USA) with allele calling was performed by two independent researchers. For quality assurance purposes, 10% of samples were selected at random for repeat analysis, performed by laboratory staff blinded to sample identity. The specific SNPs genotyped in Aus-ALL are not identified in this methodological paper, as they are used for illustration purposes only. Full analyses of their associations with childhood ALL will be published elsewhere.
All Aus-ALL SNPs
Across all 26 genotyped SNPs, the overall concordance within runs (repeat analyses of same sample) was 99.7% for blood samples (genomic DNA, hereafter referred to as gDNA) and 99.6% for WGA’d buccal DNA from the FTA cards (hereafter referred to as wgaDNA). The overall genotype failure rates for these SNPs were 0.16% and 1.68% for the gDNA and wgaDNA genotyping respectively. There were no departures from Hardy-Weinberg predictions observed in either the gDNA or wgaDNA genotype results. Among pairs of gDNA and wgaDNA samples collected from 249 case children, the average genotype discordance across all 26 diallelic SNPs (excluding discordance due to genotyping failure of one of the samples) was 3.48%, with a range of 0.40% to 12.96%. This discordance was the reason we developed the adjustment method described in this paper.
It appeared that the most likely source of error was preferential amplification from one chromosome of a pair, leading to loss of heterozygosis or “allelic drop-out” during WGA of frozen-thawed buccal DNA samples extracted from the FTA cards. Only WGA’d buccal DNA was available for control subjects; thus allelic drop-out in control subjects’ genotypes could bias associations between genotypes (and genotype-exposure interactions) and risk of disease. A modified unconditional logistic regression modeling technique was therefore developed to correct for this bias.
We assumed the genotype from the gDNA sample was the ‘true’ genotype (gold standard). 92.2% of the genotyping error in the wgaDNA samples involved a change from heterozygous in the gDNA to homozygous wild type (Aa→AA) or homozygous mutant in the wgaDNA (Aa→aa) (Table 1). Only 7.8% of discordant pairs involved a gDNA homozygous result (AA or aa) and a wgaDNA heterozygous result (Aa).
Table 1. Discordance in Genotyping Results Derived From gDNA and wgaDNA Samples From Case Children (excluding discordance due to failure of one sample)
Algorithm testing subset
A subset of 6 SNPs from the Aus-ALL study was selected for use in the testing of the devised algorithm. These SNPs were selected as they covered the full spectrum of gDNA-wgaDNA genotype discordance seen. The mean level of discordance (excluding discordance due to genotyping failure of one of the samples) for these SNPs was 5.3% (range 0.40% to 12.96%).
Statistical correction method
In brief, a discordance table is created by comparing the genotyping method that has error with a gold standard method. If discordance is present, then a proportion (calculated from the discordance table) of the records with genotypes containing error are randomly selected, their genotypes are corrected and a logistic regression model is computed. This process is repeated multiple times, ensuring a different set of records is randomly selected each time, and controlling for the fact that the proportion is an estimate. Results from the multiple iterations of the model are then compiled and imputed effect estimates are calculated. SNP A from Table 1 is referred to throughout the description of this algorithm as an example
Stage 1: Establish discordance tables
The proportions of discordant results for each SNP in the paired gDNA-wgaDNA samples are shown in Table 1. We calculated the proportions of homozygous wild type (AA), heterozygous (Aa) and homozygous mutant (aa) results from the wgaDNA samples that differed from the paired gDNA sample results, and used these to estimate the true misclassification proportion in Stage 2 (see below).
Table 2 shows the number of wgaDNA samples within each genotype category that were misclassified (assuming the paired gDNA sample gave the true result). For example, for SNP A there was 1 (of 96) incorrect AA result (should have been Aa), 1 (of 91) incorrect Aa result (should have been AA) and 3 (of 26) incorrect aa results (should have been Aa).
Table 2. Discordance Between Genotyping Results in gDNA-wgaDNA Pairs observed in SNP A and B
Stage 2: Correct the genotypes for analysis
The discordance tables prepared in Stage 1 are only an estimate, from one sample, of the true misclassification proportion p. Therefore, the proportion of records to be corrected can be sampled from a binomial distribution with probability of success parameter p, which is automatically calculated from the discordance table. To make this correction within control subjects, the genotyping results were treated as follows:
1. The SNP genotype variable was re-assigned to a temporary new genotype variable.
4. From the control subjects with genotype (AA), the calculated number (cT) of subjects which required genotype reclassification were randomly sampled (equal likelihood of selection, without replacement).
5. In the temporary new genotype variable, those subjects’ genotypes were then reclassified (from AA to Aa).
6. Steps 2 through 5 were repeated for each instance of misclassification (e.g. each non diagonal cell of the discordance table with a value other than 0)
Where two changes were required within one genotype for a SNP (for example SNP B, where some Aa values require reclassification to AA while others require reclassification to aa), the following 6 steps replaced steps 2 through 6 in the above process.
4. From the control subjects with genotype (Aa), the number (cT) of subjects requiring genotype reclassification were randomly sampled (equal likelihood of selection, without replacement).
5. Then p1 and p2 were temporarily assigned as the proportions of cT that were required for each change. Here p1 = 2/(2 + 1) for the reclassification of Aa to AA and p2 = 1/(2 + 1) for the reclassification of Aa to aa.
6. Temporary scalars c1 and c2 were then calculated (c1 = p1*cT and c2 = p2*cT)
7. In the new genotype variable, c1 and c2 of the randomly selected subjects’ genotypes from step 4 were reclassified (c1 from Aa to AA and c2 Aa to aa).
8. Steps 2 through 7 were repeated for each instance of this type of misclassification (e.g. where a single row or column of the misclassification table had more than one non diagonal cell with a value other than 0)
Stage 3: Run model to generate ‘corrected’ odds ratios for the association between each SNP and ALL risk
Correction of the misclassified genotypes, as outlined in Stage 2, was then done in each run of an unconditional logistic regression model that was repeated 50 times. Relevant covariates were included in the model. For each variable in the model, the mean ß coefficient from the 50 iterations was used as the best estimate of the true coefficient. These estimates were used to calculate Wald Test p-values  and to generate 95% confidence intervals (CI) using Rubin’s estimate of variance for multiple imputation . Odds ratios were obtained by exponentiation of the estimates.
The algorithm outlined in this paper was implemented in the statistical package R. For the R code to implement the outlined method, see our institutes website  or Additional file 1. The functions that make up the implementation of this algorithm are written so that they can be applied to any type of problem with a dichotomous outcome (e.g case/control analysis), a two or three level categorical predictor variable (e.g. a genotype) and discordance data (e.g. from a subset of the records with data from a gold standard measure, as here, or from other reference sources). The functions also allow for the inclusion of covariates in the model.
Additional file 1. R functions for misclassification correction method. A text file containing the R functions required to run the logistic regression corrected for known misclassification, as outlined within this manuscript.
Format: TXT Size: 19KB Download file
Testing the efficacy of the algorithm
To test the efficacy of the correction method, we created a series of hypothetical datasets that reflected our complete study dataset. These datasets were based on the genotype frequencies of two artificial SNPs: ‘SNP 1’ with an approximate dominant risk model (ORs fixed at: AA = 1.0, Aa = 1.01, aa = 2.04) and ‘SNP 2’ with an approximate log additive risk model (ORs fixed at: AA = 1.0, Aa = 1.30, aa = 1.68). The proportional amounts of misclassification shown in Aus-ALL SNPs A and B (the latter representing the SNP with the most discordance) among gDNA-wgaDNA pairs (Table 2) were then applied to artificial SNPs 1 and 2, resulting in six separate datasets (SNPs 1 and 2 with no error, with SNP A-type error, and with SNP B-type error). Odds ratios estimated from these data sets are shown in Table 3. We also simulated the effect of increasing the number of sample pairs available for assessing gDNA-wgaDNA misclassification. In this simulation, we used the SNP B-type misclassification figures seen in Table 2 and multiplied the observed numbers, in turn, by 0.5 and 5.
Table 3. Results of Correction Procedure on Hypothetical Datasets for Two Types of Misclassification (observed in SNPs A and B)
Using our case–control data, we then estimated odds ratios, with and without correction, for the associations between each SNP in Table 1 and risk of ALL adjusting for study matching variables (child’s age, sex, and state of residence) and ethnicity.
As expected, the results of the uncorrected (‘true’) and corrected analyses were identical (Table 3) when no error was present in the hypothetical data for artificial SNPs 1 and 2. When error was present, the estimated ORs deviated from the ‘true’ result, and did so to a greater extent when SNP B-type error was applied. For both artificial SNPs 1 and 2, the correction procedure adjusted the estimates in the direction of – and close to – the ‘true’ result. For example, the ‘true’ OR for the SNP 1 aa genotype was 2.04 (95% CI: 1.29, 3.21). With SNP A-type error introduced into the data, the uncorrected OR was 1.82 (95% CI: 1.17, 2.84) while the corrected OR was 2.11 (95% CI: 1.29, 3.45), close to the ‘true’ result. When SNP B-type error was introduced, the uncorrected odds ratio was even further from the ‘true’ estimate – 1.49 (95% CI: 1.06, 2.09); however, the corrected odds ratio was 1.99 (95% CI: 1.21, 3.26), close to the ‘true’ result.
As seen in Table 4, the odds ratios remained very similar, but the confidence intervals were somewhat wider or narrower when we simulated the effect of decreasing or increasing (respectively) the number of sample pairs available for assessing gDNA-wgaDNA misclassification.
Table 4. Results of Correction Procedure for Hypothetical Data on SNP B-type Error after Increasing Size of Misclassification Sample
To verify the optimal number of iterations provided, we used the highest level of discordance (SNP-B type error) on the artificial SNP 1, and completed 1000 repeats of 25, 50 and 100 iterations to examine the distribution of the resulting imputed OR for the aa genotype. 95% confidence intervals for the spread of this imputed OR were (1.96 – 2.10), (2.01 – 2.11) and (2.02 – 2.09) respectively. We felt the precision gained in 50 iterations compared to 25 was of value, and that the extra precision gained in 100 iterations compared to 50 was insufficient to warrant the observed modest increase in computation time. It should be noted that increasing the number of iterations would further increase the precision of the estimate, with the only expense being in added computation time.
Correction of case–control study data
The uncorrected and corrected odds ratios and 95% confidence intervals for risk of ALL associated with each genotype are shown in Table 5.
Table 5. Results of Analysis of Genotype Associated with Risk of ALL: Uncorrected and Corrected Case–control Analyses
Almost 40% of the discordance in the gDNA-wgaDNA sample pairs was in the Aa→AA direction (Table 1). The effect of our correction was to reduce the odds ratio for the Aa genotype in all six SNPs (Table 5). Over 50% of the discordance seen in the gDNA-wgaDNA sample pairs was in the Aa→aa direction (Table 1). Correction increased the ORs for aa genotypes in four out of six SNPs, while the other two remained the same (Table 5).
We aimed to correct for bias introduced by differential error in the genotyping results for control subjects compared with case subjects in our case–control study of childhood ALL. Over 90% of discordance observed in the case gDNA-wgaDNA sample pairs involved a loss of the Aa genotype in the wgaDNA samples (Table 1), consistent with allelic drop out during the WGA process. Croft and colleagues reported loss of the heterozygote genotype in 60 to 91% of discordant gDNA and wgaDNA sample pairs . If the wgaDNA samples of control subjects were affected in a similar way to, and to a similar extent as, those of the cases – as is probable since DNA from case children was taken at remission – the most likely effect of the presumed allelic dropout would be to produce a spuriously lower proportion of Aa calls among controls than among cases for particular SNPs. The effect of this would be to artificially inflate the OR associated with the Aa genotype relative to the AA.
Our method for correcting logistic regression results for genotyping error can be applied to any study where misclassification is known to be present in genotypes (or other types two or three level categorical data), and where validation data are available, or where informed estimates of probable misclassification can be made. The method can also be adapted to allow associations between gene-environment interactions and risk of disease to be investigated.
Our correction technique involved using the empirically observed patterns of discordance in matched gDNA and wgaDNA samples from cases to adjust the estimated ORs for the associations between each of these SNPs and the risk of ALL. As expected, the primary effect of the correction on real data was to reduce the ORs associated with the Aa genotype. It also tended, less predictably, to increase the ORs for aa genotypes. There are two likely explanations for this less predictable effect. First, the aa genotype is generally the rarest so that even small adjustments in frequency can produce relatively large proportional changes in the OR. Second, for most SNPs, the correction method involved simultaneous adjustment to the frequency of the AA genotype relative to the Aa genotype, so the reference category for the aa OR was also adjusted.
Applying the discordance seen in our quality control sample pairs to two artificial SNPs showed that the method was successful: when the correction was applied to hypothetical data with a known degree of misclassification, estimates were more similar to those of the error-free hypothetical data. This was the case whether the degree of misclassification was relatively small (SNP A-type) or relatively large (SNP B-type). Although the effect of the correction method on the ORs for the aa genotype tended to be less predictable than for the Aa genotype when the ‘true’ result was unknown, our simulations showed that the corrections made to the aa genotypes were as appropriate and effective as for the Aa genotypes. These results give us confidence that the correction method we devised can adjust appropriately for the genotyping error and, therefore, should reduce bias in our estimated ORs. However, since the adjustment is probabilistic, there is likely to be a reduction in precision. Simulating an increase or decrease in the size of the misclassification data set narrowed or widened (respectively) the confidence intervals (Table 4).
The statistical method we have described in this paper provides a novel and user-friendly method of correcting for differential genotyping error. Unlike previously described approaches, it can be used with modest sample sizes, allows correction of multiple misclassified variables, takes account of the size of the validation sample, and does not require the validation sample to include controls. Other genetic association studies based on cases and controls with samples derived from different sources or treatments of DNA should seek to investigate concordance between genotypes measured in DNA from both sources and, if found to be materially discordant, consider applying this adjustment procedure.
ALL: Acute lymphoblastic leukaemia; DNA: Deoxyribonucleic acid; OR: Odds ratio; SNP: Single nucleotide polymorphism; WGA: Whole genome amplification.
The authors declare that they have no competing interests.
MC devised and coded the statistical method, and drafted the paper, NDK advised on statistical methods, and drafted the paper, EM was involved in study design and management and drafting the paper, KG ran the analysis of real and mock genotyping data and assisted in drafting the paper, SJ advised on genotyping analysis and methods, DA advised on statistical methods, FB coordinated sample processing, genotyping methods, study design and assisted in drafting the paper, BA was involved in study design and drafting the paper. All authors approved the final version of the paper.
The authors wish to thank Lan Nguyen and Lyndsay Beasley for sample processing and genotyping in the central laboratory at Royal Perth Hospital.
The Aus-ALL consortium conducted the study and the Telethon Institute for Child Health Research (TICHR), University of Western Australia, was the coordinating centre. Bruce Armstrong (Sydney School of Public Health), Elizabeth Milne (TICHR), Frank van Bockxmeer (Royal Perth Hospital), Michelle Haber (Children’s Cancer Institute Australia), Rodney Scott (University of Newcastle), John Attia (University of Newcastle), Murray Norris (Children’s Cancer Institute Australia), Carol Bower (TICHR), Nicholas de Klerk (TICHR), Lin Fritschi (WA Institute for Medical Research), Ursula Kees (TICHR), Margaret Miller (Edith Cowan University), Judith Thompson (WA Cancer Registry) were the research investigators and Helen Bailey (TICHR) was the project coordinator. The clinical Investigators were: Frank Alvaro (John Hunter Hospital, Newcastle); Catherine Cole (Princess Margaret Hospital for Children, Perth); Luciano Dalla Pozza (Children’s Hospital at Westmead, Sydney); John Daubenton (Royal Hobart Hospital, Hobart); Peter Downie (Monash Medical Centre, Melbourne); Liane Lockwood, (Royal Children’s Hospital, Brisbane); Maria Kirby (Women’s and Children’s Hospital, Adelaide); Glenn Marshall (Sydney Children’s Hospital, Sydney); Elizabeth Smibert (Royal Children’s Hospital, Melbourne); Ram Suppiah, (previously Mater Children’s Hospital, Brisbane).
Grants and financial support
This study was funded by the Australian National Health and Medical Research Council (Grant ID: 254539). E. Milne was supported by an NH&MRC Career Development Award and Cancer Council Western Australia Research Fellowship. The funding bodies had no role in the design or analysis of this study, or in manuscript preparation.
Milne E, Royle JA, de Klerk NH, Blair E, Bailey H, Cole C, Attia J, Scott RJ, Armstrong BK: Fetal growth and risk of childhood acute lymphoblastic leukemia: results from an Australian case–control study.
Cancer Epi Bio Prev 2006, 15(4):816-819. Publisher Full Text
Clayton DG, Walker NM, Smyth DJ, Pask R, Cooper JD, Maier LM, Smink LJ, Lam AC, Ovington NR, Stevens HE, et al.: Population structure, differential bias and genomic control in a large-scale, case–control association study.
Bioinformatics at TICHR Website.
http://bioinformatics.childhealthresearch.org.au/software/ webcite Last accessed: August 6 2012
Croft DT Jr, Jordan RM, Patney HL, Shriver CD, Vernalis MN, Orchard TJ, Ellsworth DL: Performance of whole-genome amplified DNA isolated from serum and plasma on high-density single nucleotide polymorphism arrays.
The pre-publication history for this paper can be accessed here: