Email updates

Keep up to date with the latest news and content from BMC Proceedings and BioMed Central.

This article is part of the supplement: Genetic Analysis Workshop 15: Gene Expression Analysis and Approaches to Detecting Multiple Functional Loci

Open Access Proceedings

Mapping a gene for rheumatoid arthritis on chromosome 18q21

William Tapper*, Andrew Collins and Newton E Morton

Author Affiliations

Human Genetics Division, University of Southampton, Southampton General Hospital, Tremona Road, Southampton, Hampshire SO16 6YD. UK

For all author emails, please log on.

BMC Proceedings 2007, 1(Suppl 1):S18  doi:

The electronic version of this article is the complete one and can be found online at:

Published:18 December 2007

© 2007 Tapper et al; licensee BioMed Central Ltd.

This is an open access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Although single chi-square analysis of the North American Rheumatoid Arthritis Consortium (NARAC) data identifies many single-nucleotide polymorphisms (SNPs) with p-values less than 0.05, none remain significant after Bonferroni correction. In contrast, CHROMSCAN evades heavy Bonferroni correction and auto-correlation between SNPs by using composite likelihood to model association across all markers in a region and permutation to assess significance. Analysis by CHROMSCAN identifies a 36-kb interval that includes the most significant SNP (msSNP) observed in a 10-Mb target suggested by linkage. Unexpectedly, stratification by gender and age of onset shows that association evidence comes almost entirely from females with age of onset less than 40. Combining evidence from a meta-analysis of linkage studies and three subsets of the NARAC data provides significant evidence for a determinant of rheumatoid arthritis in a 36-kb interval and illustrates the principle that estimates of location and its information are more powerful than estimates of p-values alone.


Initially, linkage mapping dealt with rare and highly penetrant genes. Without cytogenetic assignment, the preferred strategy was segregation analysis to determine all relevant parameters except recombination, followed by linkage analysis to determine recombination frequency [1]. Complex inheritance with uncertain segregation parameters proved much more difficult, giving rise to many unconfirmed claims based on microsatellites and leading to meta-analysis without point locations [2]. The HapMap project provides dense SNPs that can be used to localize causal loci with or without pedigrees. This procedure, called association mapping, revolutionized identification of disease genes. Recent developments of linkage disequilibrium units (LDU), composite likelihood, control of auto-correlation, and meta-analysis are incorporated into the CHROMSCAN program [3,4] to increase its precision for association mapping. Here we use these methods to establish the location and weight of evidence for a gene predisposing to rheumatoid arthritis.


Data preparation

The data, provided by NARAC (North American Rheumatoid Arthritis Consortium) consist of 2300 single-nucleotide polymorphisms (SNPs) in a 10-Mb region of 18q21 with linkage evidence in U.S. and French scans [5]. Illumina genotyped these markers in 460 cases and 460 controls, matched for age and gender, from New York. The genotypic data for controls were screened and 7 SNPs with <a onClick="popup('','MathML',630,470);return false;" target="_blank" href="">View MathML</a> > 10 for the Hardy-Weinberg test [6] were removed, leaving 2293 to be analyzed. CHROMSCAN requires SNPs to be located on both physical and LDU scales. Physical locations were taken from build 35 of the human genome sequence. Unlike physical maps, study-specific and various LDU maps are available, corresponding to the four HapMap samples separately and combined (CEU, CHB, JPT, YRI, and cosmopolitan). The LDU map with the highest SNP density and population attributes closest to the experimental data should be optimal. We therefore used LDU locations relative to the CEU HapMap data with a density of 1 SNP per 863 bp compared to 1 SNP per 4139 bp in the NARAC data. We also used the kilobase map to determine the robustness and power of LDU maps compared with physical maps.

LDU map construction

The theory for constructing LDU maps has been described [7]. Briefly, the LDU distance for the ith SNP interval is given by εidi, where εi describes the exponential decline of association with physical distance di in kb. Values of εi are estimated by composite likelihood that fits the Malecot model [8] to multiple pairwise diplotype data. The Malecot equation, given by <a onClick="popup('','MathML',630,470);return false;" target="_blank" href="">View MathML</a>, uses additional parameters to describes association at the last major bottleneck (M), and residual association at large distance (L) to predict rho (ρ), the probability of association.

Association mapping

The CHROMSCAN program [3] uses a model similar to LDU maps except the exponential term is replaced by εΔ(Si - S) to estimate the location (S) of a disease gene, where Si is the location of the ith marker in kilobases or LDU. The Kronecker Δ is used for map direction and assures a correct sign, with Δ = 1 if Si S or -1 if Si <S. To calculate the expected association with distance, zi, the model becomes <a onClick="popup('','MathML',630,470);return false;" target="_blank" href="">View MathML</a>, where M is diminished by complex inheritance and L is the association at large distance. The observed association <a onClick="popup('','MathML',630,470);return false;" target="_blank" href="">View MathML</a> is determined by a 2 × 2 table between affection status and the two alleles of each SNP to give <a onClick="popup('','MathML',630,470);return false;" target="_blank" href="">View MathML</a> and <a onClick="popup('','MathML',630,470);return false;" target="_blank" href="">View MathML</a>, where ad - bc ≥ 0 and b c is ensured by rearrangement of columns and rows [9]. Given the observed associations <a onClick="popup('','MathML',630,470);return false;" target="_blank" href="">View MathML</a>, the Malecot parameters are estimated iteratively using composite likelihood, which evades a heavy Bonferroni correction by combining information over all loci within a region as <a onClick="popup('','MathML',630,470);return false;" target="_blank" href="">View MathML</a>, where <a onClick="popup('','MathML',630,470);return false;" target="_blank" href="">View MathML</a> and zi are the observed and expected association values, respectively, at the ith SNP. Their squared difference is weighted by information (Ki) which is estimated as: <a onClick="popup('','MathML',630,470);return false;" target="_blank" href="">View MathML</a>, where <a onClick="popup('','MathML',630,470);return false;" target="_blank" href="">View MathML</a> is the Pearson <a onClick="popup('','MathML',630,470);return false;" target="_blank" href="">View MathML</a> from the 2 × 2 table.

Sub-hypotheses of the Malecot model are used to test for a causal polymorphism. Model A, which estimates none of the parameters and uses M = 0 with predicted L [10], is taken as the null hypothesis H0 in which there is no association between affection status and SNPs. Model D estimates M, S, and L. Therefore the ΛA - ΛD comparison tests for a disease determinant at location S. For both models, ε is fixed to 1 for the LDU map and to a value of ε determined from pairwise marker-by-marker association data for the kilobase map. In order to account for autocorrelation between SNPs as a result of LD, the significance of evidence is determined by a rank-based permutation test [3].

Three separate analyses of the data were performed by CHROMSCAN. The first is a preliminary screen of the entire 10-Mb bin, which is divided into 18 nonoverlapping regions, each with at least 30 SNPs and covering at least 10 LDUs. To determine accurate levels of significance, the number of permutation replicates must approach the actual level of significance so that interpolation of the variance under H1 is reliable. To minimize computation time, the initial analysis was restricted to 100 replicates. Significant regions identified by the initial screen were re-analyzed separately using 1000 and 5000 replicates in order to verify convergence. To demonstrate the power of LDU maps, this analysis was repeated using the kilobase map and two estimates of the exponential decline ε derived from the significant region and the 10-Mb region [11]. The risk for rheumatoid arthritis is elevated in females, especially with late onset (≥35–≤60) [12]. Our third analysis therefore stratified cases into three groups corresponding to males, females with onset ≤39, and females with onset ≥40. The partition of females around an onset age of 40 was chosen to give approximately equal numbers of 'early' and 'late' onset cases. Unaffected controls for the three groups were all males (with similar age and total number of individuals as affected males), and females divided by current age to give similar total numbers of individuals as cases, respectively. This analysis was restricted to significant regions from the initial screen and used 5000 replicates.


Association mapping

Single chi-square analyses of the 10-Mb region identifies 125 SNPs with p < 0.05, none of which reach significance after Bonferroni correction (0.05/2293). The initial screen by CHROMSCAN divides the 18q21 bin into 18 nonoverlapping regions. Although the most significant SNP (msSNP, rs3745064) occurs in region 6, the next msSNP in region 11 is deceptively close in terms of significance, and several other regions contain suggestive SNPs (Table 1). In contrast, the composite likelihood approach, which models association across all markers in a region, identifies region 6 as the only significant region (p = 0.01259). The intensive screen of region 6 identified a large increase in significance between 100 and 1000 replicates, which is attributed to the relationship between number of replicates and significance, while the small decrease in significance between 1000 and 5000 replicates suggests that convergence has been achieved (Table 2). These analyses estimate a causal locus (S) at 53308 kb.

Table 1. Regions screened with 100 replicates

Table 2. Intensive screening of region 6

The CHROMSCAN analysis of region 6 was repeated using the kilobase map so that its performance can be compared with the LDU map. Using a kilobase map requires specification of the exponential decline ε [11]. Two values of ε, corresponding to the 10 Mb interval (0.021) or region 6 alone (0.031), were investigated. Despite the large difference between ε values for the kilobase map, the significance level and location were almost identical. However, the ratios of <a onClick="popup('','MathML',630,470);return false;" target="_blank" href="">View MathML</a> indicate that the kilobase maps have a relative efficiency of 75% compared with an LDU map at 1000 replicates (Table 2).

Because King et al. [12] demonstrated that the risk for rheumatoid arthritis is elevated in females, especially with late onset, we stratified cases into three groups according to sex and age of onset. The effect of this stratification is highly suggestive despite its crudeness (Table 3) and small sample sizes. Females with onset ≤39 account for most of the association. The other two classes give such small chi-square values that they would undoubtedly be assigned to other regions if the partition test had not been restricted to region 6 on the pooled evidence. However, when considering region 6 alone, there is remarkable agreement between point estimates for 'early' and 'late' onset females and those from males. At this time it is impossible to say whether this consistency is caused by imperfectly divided onset groups or a small effect at late age.

Table 3. Stratification by gender and age of onset (5000 replicates)


Choi et al. [13] reported a meta-analysis of four linkage studies with microsatellites in a 10-Mb bin of chromosome 18. The results from this study were reported as p-values without estimates of location or standard errors. Without this information, the power for meta-analysis is reduced because the sum of two <a onClick="popup('','MathML',630,470);return false;" target="_blank" href="">View MathML</a> values must be converted back to <a onClick="popup('','MathML',630,470);return false;" target="_blank" href="">View MathML</a> and LOD1 instead of weighting estimates of location by their information. Perhaps because of this inefficiency, the combined LOD1 from this meta-analysis is 1.542, well below the conventional value of 3 for asserting significance. The corresponding p-value in large-sample theory is 0.007714, providing strong but inconclusive evidence for localization in the 18q21 region. Despite its limitations, linkage contributes evidence that should not be ignored.

Joint significance of linkage and association

The simplest meta-analysis is based on n independent samples, the ith of which contributes a Pi value that on the null hypothesis is uniformly distributed. Then -2 ln Pi would be distributed as <a onClick="popup('','MathML',630,470);return false;" target="_blank" href="">View MathML</a>, with <a onClick="popup('','MathML',630,470);return false;" target="_blank" href="">View MathML</a>. This is the only test applicable to data that do not provide an estimate of location Si and information Ki, but has three disadvantages; first, equal weight is given to samples with different standard errors; second, there is no test of homogeneity; and third, there is no point estimate to become more precise as n increases. As a consequence, much information is lost. Accepting these limitations and assuming accuracy of the P estimates, Table 4 shows that combining pooled association with linkage provides suggestive evidence to assign a gene for rheumatoid arthritis to the 18q21.31 interval. The LOD1 with no Bonferroni correction is 2.676 for linkage and pooled association. When location and information weight are available, the evidence for association is combined by determination of the difference between <a onClick="popup('','MathML',630,470);return false;" target="_blank" href="">View MathML</a> with n degrees of freedom and <a onClick="popup('','MathML',630,470);return false;" target="_blank" href="">View MathML</a>, which tests for heterogeneity with n - 1 degrees freedom where <a onClick="popup('','MathML',630,470);return false;" target="_blank" href="">View MathML</a>. When the stratified association samples are combined in this manner, the heterogeneity test is negligible. As expected, power is increased when pooled with linkage (LOD1 = 3.401, p = 0.000076). Even with conservative adjustment of the p-value to account for the 18 regions tested by association (18*0.000569), and despite strong although not formally significant, evidence from linkage for at least one causal gene in the 18 regions, the meta-analysis is supportive (LOD1 = 2.327, p = 0.001062). We conclude that evidence for region 6 is probative, with linkage and association both providing critical evidence despite lack of a point estimate and information weight for linkage.

Table 4. Meta-analysis of association (5000 replicates) and linkage


This application demonstrates that CHROMSCAN is a powerful approach for gene mapping in complex inheritance, which is applicable to meta-analysis. Obvious extensions include identification of a causal locus and more precise definition of the phenotype associated with it. The 95% confidence interval, given by S ± 1.96 (SE), covers 36 kb between 53296 and 53332 kb and includes the msSNP rs3745064. Although no described genes are within this region, it does include four human mRNAs from GenBank: CR590917, AK021217, AK124558, and BC01314, all to the left of point estimate (S). Of these, CR590917 appears to be the most interesting because it is expressed within T cells and could therefore conceivably affect risk for rheumatoid arthritis. Finally, geneid [14] and Genscan [15] predict a similar gene, which is the closest annotated sequence to the point estimate (S). However, nothing is known about the function of this gene and its reliability is questionable. The fascinating directions revealed by these findings have yet to be explored. Ultimately, interaction with other contributing loci and environmental factors will be recognized and, more importantly, locus-specific treatment will be found.

Recent papers testify to growing interest in meta-analysis, looking backward to linkage rather than forward to association mapping. Rank permutation provides a valid significance test, but the genome search meta-analysis (GSMA) that uses regional assignment with arbitrary weights cannot give a reliable estimate of effect and therefore has low power for estimating point location and detecting heterogeneity [16,17]. Most of the few papers on association mapping assume family data rarely feasible for diseases of late onset and are restricted to single markers without composite likelihood to estimate both location S and its information K. One manuscript presented in GAW15 that used meta-analysis without those estimates failed to detect the strong signal on chromosome 18q demonstrated by composite likelihood [18].

Competing interests

The author(s) declare that they have no competing interests.


This article has been published as part of BMC Proceedings Volume 1 Supplement 1, 2007: Genetic Analysis Workshop 15: Gene Expression Analysis and Approaches to Detecting Multiple Functional Loci. The full contents of the supplement are available online at


  1. Shuman S: Structure, mechanism, and evolution of the mRNA capping apparatus.

    Prog Nucleic Acid Res Mol Biol 2000, 66:1-40. OpenURL

  2. Morton NE: Sequential tests for the detection of linkage.

    Am J Hum Genet 1955, 7:277-318. PubMed Abstract | PubMed Central Full Text OpenURL

  3. Levinson DF, Levinson MD, Segurado R, Lewis CM: Genome scan meta-analysis of schizophrenia and bipolar disorder. Part I: Methods and power analysis.

    Am J Hum Genet 2003, 73:17-33. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  4. Morton NE, Maniatis N, Zhang W, Ennis S, Collins A: Genome scanning by composite likelihood.

    Am J Hum Genet 2007, 80:19-28. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  5. CHROMSCAN [] webcite

  6. Amos CI, Chen WV, Lee A, Li W, Kern M, Lundsten R, Batliwalla F, Wener M, Remmers E, Kastner DA, Chrisiwell LA, Seldin MF, Gregersen PK: High-density SNP analysis of 642 Caucasian families with rheumatoid arthritis identifies two new linkage regions in 11p12 and 2q33.

    Genes Immun 2006, 7:277-286. PubMed Abstract | Publisher Full Text OpenURL

  7. Gomes I, Collins A, Lonjou C, Thomas NS, Wilkinson J, Watson M, Morton N: Hardy-Weinberg quality control.

    Ann Hum Genet 1999, 63:535-538. PubMed Abstract | Publisher Full Text OpenURL

  8. Maniatis N, Collins A, Xu CF, McCarthy LC, Hewett DR, Tapper W, Ennis S, Ke X, Morton NE: The first linkage disequilibrium (LD) maps: delineation of hot and cold blocks by diplotype analysis.

    Proc Natl Acad Sci USA 2002, 99:2228-2233. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  9. Collins A, Morton NE: Mapping a disease locus by allelic association.

    Proc Natl Acad Sci USA 1998, 95:1741-1745. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  10. Maniatis N, Morton NE, Gibson J, Xu CF, Hosking LK, Collins A: The optimal measure of linkage disequilibrium reduces error in association mapping of affection status.

    Hum Mol Genet 2005, 14:145-153. PubMed Abstract | Publisher Full Text OpenURL

  11. Morton NE, Zhang W, Taillon-Miller P, Ennis S, Kwok PY, Collins A: The optimal measure of allelic association.

    Proc Natl Acad Sci USA 2001, 98:5217-5221. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  12. Lau W, Kuo TY, Tapper W, Cox S, Collins A: Exploiting large scale computing to construct high resolution linkage disequilibrium maps of the human genome.

    Bioinformatics 2007, 23:517-519. PubMed Abstract | Publisher Full Text OpenURL

  13. King RA, Rotter JI, Motulsky AG: The Genetic Basis of Common Disease. New York: Oxford University Press; 1992:598-599. OpenURL

  14. Choi SJ, Rho YH, Ji JD, Song GG, Lie YH: Genome scan meta-analysis of rheumatoid arthritis.

    Rheumatology 2006, 45:166-170. PubMed Abstract | Publisher Full Text OpenURL

  15. Blanco E, Parra G, Guigó R: Using geneid to identify genes. In Current Protocols in Bioinformatics. Edited by Baxevanis AD, Davison DB. New York: John Wiley & Sons Inc; 2002:1-26. OpenURL

  16. Burge C, Karlin S: Prediction of complete gene structures in human genomic DNA.

    J Mol Biol 1997, 268:78-94. PubMed Abstract | Publisher Full Text OpenURL

  17. Zintzaras E, Kitsios G: Identification of chromosomal regions linked to premature myocardial infarction: a meta-analysis of whole-genome searches.

    J Hum Genet 2006, 51:1015-1021. PubMed Abstract | Publisher Full Text OpenURL

  18. Lewis CM, Levinson DF: Testing for genetic heterogeneity in the genome search meta-analysis method.

    Genet Epidemiol 2006, 30:348-355. PubMed Abstract | Publisher Full Text OpenURL

  19. Segurado R, Hamshere ML, Glaser B, Nikolov I, Moskvina V, Holmans P: Combining linkage datasets for meta-analysis and mega-analysis: the GAW15 rheumatoid arthritis data set.

    BMC Genet 2007, xx:xxx. OpenURL