PedGenie beta version 2.1 is a unique, flexible, and easily implemented analysis software tool that is enhanced significantly by incorporation of meta-statistics to allow valid combined analysis of multiple studies, including mixtures of family-based and independent resources, in the detection of genetic association with common disease. Genetic Analysis Workshop 15 Problem 2 data, provided by the North American Rheumatoid Arthritis Consortium, were used to demonstrate PedGenie 2.1 meta-association testing of variants in the PTPN22 gene and rheumatoid arthritis across multiple resources containing both family-based and independent individuals. Our findings are generally consistent with previous reports for a panel of 14 single-nucleotide polymorphism (SNP) markers, including functional coding SNP R620W, in which the minor allele conferred a significant two-fold increased risk. More power to detect associations was achieved in certain analyses by using extra family-based samples, rather than restricting analyses to single cases randomly selected from each pedigree.
In the study of common diseases and genes with modest effects, large consortium and multicenter efforts hold the promise of increased power to detect associations, but also present analytical challenges. Candidate gene study populations differ geographically and ethnically, and considerable differences in case-control ascertainment and pedigree structures between resources are likely. Currently, no software package exists that allows valid meta-genetic association testing in mixtures of independent and family-based resources (including pedigrees of arbitrary length and configuration) between or within studies. PedGenie 2.1 (beta version) extends the functionality currently available in PedGenie (version 1.2) [1,2] by incorporating meta-statistics for combined analysis of multistudy resources, along with Monte Carlo significance testing, which allows for a mixture of pedigree members (both nuclear and extended families) and independent individuals. Data from Problem 2 of the Genetic Analysis Workshop 15 (GAW15) were used to demonstrate meta-association testing of the PTPN22 candidate gene (14 single-nucleotide polymorphisms, or SNPs) and consensus criteria rheumatoid arthritis (RA) phenotype and sub-phenotypes in combined family-based and independent individuals using PedGenie 2.1. RA, a common systemic autoimmune disease, affects about 1% of adults worldwide and has an estimated heritability of 50 to 60% [3,4]. The association of RA susceptibility with a missense variant in the hematopoietic-specific protein tyrosine phosphatase gene, PTPN22 (R620W, rs2476601), has been previously suggested .
PTPN22 SNP and RA phenotype data
Data provided by the North American Rheumatoid Arthritis Consortium (NARAC) were obtained for 14 SNPs in PTPN22. The genotypes and phenotypes were collected from Caucasian individuals in NARAC affected sibling-pair families (1393 cases) and 1519 matched, independent non-diseased controls from New York City (NYC), reported in Carlton et al. . Within this sample were data for 839 affected sib-pair cases and 855 controls, reported in Plenge et al. . In addition to RA status (affected or unaffected), detailed phenotype information available on most cases included rheumatoid factor IgM (RF), a measure of active disease correlated with erosive arthritis. A threshold of 11 and greater was designated as RF+. In Kroot et al.  RF titers under 10 were considered normal; however, as levels below 11 could not be quantitated accurately, we used this slightly higher threshold. Elevated anti-cyclic citrullinated peptide (anti-CCP) levels have been shown to predict increased risk for RA development, with an antibody titer threshold of 49 or greater considered anti-CCP+ .
Demonstration studies: NARAC 1 and NARAC 2
For the purpose of demonstrating meta-analysis across multiple studies with PedGenie 2.1 in association testing of a candidate gene, the PTPN22 SNP and phenotype data were separated into two study files, designated NARAC 1 and NARAC 2 (Table 1). NARAC 2 comprised the 1694 individuals that were studied by Plenge et al. , including both families and independent controls. NARAC 1 comprised all remaining individuals, family cases, unaffected family controls, and independent NYC controls. Because PedGenie can incorporate family relationships in association testing, any individuals were included for whom affected status could be determined and genotypes were available. Specifically, the data analyzed contained siblings and genotyped parents. In two NARAC 1 pedigrees, four adult offspring in affected sibships with genotypes and RA diagnosis between ages 17 to 44 were also available. Including unaffected siblings with genotypes resulted in an additional 103 family-based controls for analysis in NARAC 1.
Table 1. PTPN22 SNP data: study descriptives
Use of family data for allelic or genotype association testing must account for correlations between related individuals to avoid underestimation of variance in a statistic of interest and increased type I error. Several family-based association methods exist, but most are limited by pedigree structure or the statistics that can be performed. Ideally, utilizing all available information on pedigree structure is the most informative approach , and, for multi-group efforts, the availability of meta-statistics. PedGenie 2.1, which is a Monte Carlo based method, has been developed with meta-capabilities, which entails the use of study-specific allele or haplotype frequencies and established meta-statistics . Briefly, the Monte Carlo procedure is based on simulating null genotype configurations for each resource and deriving null meta-statistics across resources, achieved as follows. Alleles are estimated within each resource. Alleles are then assigned to founders randomly, in proportion to estimated allele frequencies for the specific resource and a Mendelian gene-drop simulation is performed independent of phenotype; each possible null genotype configuration is used to calculate a null meta-statistic. This is repeated to create an empirical null distribution for significance testing. PedGenie, freely available and easily implemented in a computing environment running Java 1.5 , was developed to allow flexibility in hypothesis testing; tests may be constructed for alleles or genotypes in any user-defined grouping, using any reference group as baseline. Several options are available within PedGenie to estimate allele frequencies for the gene-drops. In the NARAC 1 and NARAC 2 resources, there are a large number of relatively small pedigrees and the number of genotyped founders is limited. Therefore, the allele frequencies were estimated from all genotyped individuals. The empirical null distributions were created from 1000 simulations. PedGenie appropriately handles sparse data and missing data structure . By providing information on the number of simulations for which a statistic can be calculated, sparse data is indicated when the number of simulations in which a statistic is calculated is less than the total number of simulations. In the gene drop procedure, individuals missing genotypes for a specific locus are reset to missing, and calculation of test statistics in the simulated data are limited to individuals with observed genotypes .
PedGenie beta version 2.1 incorporates meta-statistics to allow valid combined analysis of multiple studies, including family-based resources, in the detection of genetic association with common disease. In epidemiologic studies, data are often collected that can be summarized in three-way contingency tables, the presence or absence of a disease phenotype cross-classified with allele or genotype and a controlling for a third categorical variable (study) which represents combinations of levels of several variables (race, sex, age, etc.) . Meta statistics for genotype, composite genotype, or haplotype analysis across studies currently incorporated in PedGenie 2.1 are based on the generalized Cochran-Mantel-Haenszel (CMH) approach described elsewhere [8,10]. CMH procedures are used to calculate odds ratios, chi-squared general association test of independence, and chi-squared test of trend (mean score statistic where ordered wild-type, heterozygous, and homozygous variant genotypes lie on an ordinal scale).
The study characteristics and RA phenotypes used to demonstrate PedGenie 2.1 meta-association analysis are described in Table 1. Allele frequencies for controls and PTPN22 SNP associations with RA for each demonstration study and the combined study PedGenie 2.1 meta-analysis results are shown in Table 2, along with previously published reports. Generally, our meta-analysis results using GAW15 data corroborate previous findings for the panel of 14 SNP markers that includes functional coding SNP R620W [5,6]. It is of note that on inspection of linkage disequilibrium (LD) between SNPs, R620W was not in strong pairwise LD with any other SNP in the set. Markers rs1217413, rs1217388, rs1310182, and rs1217414 were in pairwise LD (measured by r2 > 0.4) with each other. SNPs rs12730735 and rs12760457 were in complete LD. SNP ss38346943, which has a rare, protective minor allele, was not in LD with any other marker.
Table 2. Allele-based associations with RA (case-control comparison, odds ratios (ORs) estimated for minor allele)
Table 3 shows genotype associations with RA and PTPN22 markers for an additive model (Armitage test for trend ) that were significant in at least one study. Previously, Carlton et al. reported two markers with associations independent of R620W; rs3789604 and rs1310182, both in putative transcription factor binding sites . The odds ratios reported by Carlton et al. were adjusted accordingly, and these are shown in Table 3. R620W was adjusted for rs3789604, and rs3789604 and rs1310182 were adjusted for R620W.
Table 3. Genotype-based associations with RA (case-control comparison, reference is major allele homozygote)
Previous studies reported that susceptible PTPN22 R620W genotypes containing the variant allele were strongly associated with RF+ but not RF- disease [3,6]. Our meta-analysis supported an association in RF+ cases only compared to controls, although a comparison of RF+ to RF- cases was not significant (Table 4). Plenge et al. reported an association with R620W variant genotypes and anti-CCP+ cases, but not in anti-CCP- cases . Our meta-analysis showed an association for anti-CCP+ cases and anti-CCP- cases in comparison to controls (Table 4). Thus, in our larger meta-analysis across studies, we could not confirm that anti-CCP level discriminates in R620W-associated RA. However, an association was seen for SNPs in pairwise LD (rs1217413, rs1217388, rs1310182, and rs2488458) and anti-CCP+ cases only vs. controls (heterozygous/variant vs. wild-type odds ratios (ORs) between 1.4 and 1.5, 95% confidence interval (CI), 1.2 to 1.8). Further, the comparison of anti-CCP+ to anti-CCP- cases was significant for these markers.
Table 4. Genotype-based associations with R620W and RA sub-type (reference is major allele CC homozygote)
PedGenie 2.1 can correct for all relationships in family-based resources, therefore all family members with phenotype and genotype data are available can be included. If only one affected sibling was chosen from each pedigree, the sample size here would have reduced from 1285 cases and 1621 controls to 665 and 1518, respectively. For example, in NARAC 1, a family with five affected and five unaffected genotyped siblings and parents with known RA status (one genotyped) were included in PedGenie analyses, whereas only one affected sib was randomly selected from this family for the study performed by Carlton et al. .
CMH statistics for ordered genotypes have been used to assess association in multiple case-control studies in which cases are independent, either probands or randomly selected affecteds . A combined odds ratio estimate of the association in both case-control and transmission-disequilibrium studies have been proposed . PedGenie beta 2.1 allows for valid meta-analyses of combined family-based and case-control studies using CMH techniques, while accommodating comprehensive information in large, multigenerational families without pedigree splitting required in other packages. The ability to combine family and case-control resources and use all data available both increases the utility of prior linkage resources and can provide increased power to detect associations, particularly in stratified and subset analyses that likely lead to small sample sizes in individual studies.
In conclusion, our method is a more comprehensive way of using all data available in meta-association testing, with more power to detect associations by using extra family-based samples rather than restricting to randomly selected cases from each pedigree. Our findings generally corroborate those previously reported. We support previous findings that the PTPN22 gene is associated with RA. However, our results do not indicate that anti-CCP antibody status significantly discriminates for disease in the functional R620W SNP.
The author(s) declare that they have no competing interests.
This work was funded by CA123550-01 and CA098364-01 (to NJC).
This article has been published as part of BMC Proceedings Volume 1 Supplement 1, 2007: Genetic Analysis Workshop 15: Gene Expression Analysis and Approaches to Detecting Multiple Functional Loci. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/1?issue=S1.
PedGenie Version 1.2 [http://www-genepi.med.utah.edu/PedGenie/index.html] webcite
Begovich AB, Carlton VE, Honigberg LA, Schrodi SJ, Chokkalingam AP, Alexander HC, Ardlie KG, Huang Q, Smith AM, Spoerke JM, Conn MT, Chang M, Chang SY, Saiki RK, Catanese JJ, Leong DU, Garcia VE, McAllister LB, Jeffery DA, Lee AT, Batliwalla F, Remmers E, Criswell LA, Seldin MF, Kastner DL, Amos CI, Sninsky JJ, Gregersen PK: A missense single-nucleotide polymorphism in a gene encoding a protein tyrosine phosphatase (PTPN22) is associated with rheumatoid arthritis.
Huizinga TW, Amos CI, van der Helm-van Mil AH, Chen W, van Gaalen FA, Jawaheer D, Schreuder GM, Wener M, Breedveld FC, Ahmad N, Lum RF, de Vries RR, Gregersen PK, Toes RE, Criswell LA: Refining the complex rheumatoid arthritis phenotype based on specificity of the HLA-DRB1 shared epitope for antibodies to citrullinated proteins.
Carlton VE, Hu X, Chokkalingam AP, Schrodi SJ, Brandon R, Alexander HC, Chang M, Catanese JJ, Leong DU, Ardlie KG, Kastner DL, Seldin MF, Criswell LA, Gregersen PK, Beasley E, Thomson G, Amos CI, Begovich AB: PTPN22 genetic variation: evidence for multiple variants associated with rheumatoid arthritis.
Plenge RM, Padyukov L, Remmers EF, Purcell S, Lee AT, Karlson EW, Wolfe F, Kastner DL, Alfredsson L, Altshuler D, Gregersen PK, Klareskog L, Rioux JD: Replication of putative candidate-gene associations with rheumatoid arthritis in >4,000 samples from North America and Sweden: association of susceptibility with PTPN22, CTLA4, and PADI4.
Kroot EJ, de Jong BA, van Leeuwen MA, Swinkels H, van den Hoogen FH, van't Hof M, van de Putte LB, van Rijswijk MH, van Venrooij WJ, van Riel PL: The prognostic value of anti-cyclic citrullinated peptide antibody in patients with recent-onset rheumatoid arthritis.
Hall D, Woolson RF, Clarke WR, Jones MF: Cochran-Mantel-Haenszel Techniques: applications involving epidemiologic survey data. Athens, Georgia: Department of Statistics, University of Georgia; 1997:1-31.
Zeggini E, Groves CJ, Parkinson JR, Halford S, Owen KR, Frayling TM, Walker M, Hitman GA, Levy JC, O'Rahilly S, Hattersley AT, McCarthy MI: Large-scale studies of the association between variation at the TNF/LTA locus and susceptibility to type 2 diabetes.