Abstract
Background
We have conducted a genomewide association study on the Genetic Analysis Workshop (GAW) 16 rheumatoid arthritis data using a multilocus score test based on wavelet transform proposed recently by the authors. The waveletbased test automatically adjusts for the amount of noise suppressed from the data. The power of the test is also increased by using the genetic information contained in the spatial ordering of singlenucleotide polymorphisms on a chromosome.
Results
After adjusting for the effect of population stratification, the test identified some previously discovered rheumatoid arthritis susceptibility loci (HLADRB1 and rs3761847) as well as some loci (rs2076530 and rs3130340) known to have association with sarcoidosis and bone mineral density. It was previously reported that patients with rheumatoid arthritis have elevated prevalence of sarcoidosis and have reduced bone mass.
Conclusion
This new test provides a useful tool in genomewide association studies.
Background
In genomewide association studies, the first choice of tests is usually a singlemarker test. If a singlenucleotide polymorphism (SNP) has a strong association with a disease, singlemarker tests should have higher power than multilocus tests. Multilocus tests can achieve higher power if several SNPs are associated with the disease. However, the potential high power of multilocus tests could be diminished as an increased number of markers results in an increased number of degrees of freedom. Therefore, reducing the number of degrees of freedom is essential to increasing the power of multilocus tests. Different strategies were introduced to reduce the number of degrees of freedom. Tests based on haplotype sharing (the longest continuous interval of matching alleles between haplotypes) effectively reduce the number of degrees of freedom [1,2]. Another popular approach is to apply principal components analysis (PCReg) [3]. A score test based on the Fourier transform [4] is another attempt to reduce the degree of freedom, thereby increasing power. Instead of using the genotype data, weighted Fourier transform coefficients of the genotype data are used to form a score test.
Many multilocus association tests are not affected by permuting spatial order of SNPs; thus, they do not use the information contained in the ordering of SNPs. For example, the results of logistic regression will not change if the order of SNPs are permuted. The same is true for the test obtained by fitting a regression function with one SNP followed by Bonferroni correction to find the global pvalue. PCReg is also invariant under the permutation of SNPs. We provide a proof of the above claims as follows (See Wang and Abbott [3] for notations in the following proof.). Let G be a matrix of coded genotypes and suppose it is centered. The variancecovariance matrix of genotypes is A = G^{T}G/(n  1) = VDV^{1}, where D is a diagonal matrix with eigenvalues of A as diagonal entries, and the columns of V are the eigenvectors. The regression model is y = GV_{1}b + ε, where V_{1 }is the first several columns of V. After permutation of SNPs, the columns of G are also permuted, and G becomes GP, where P is a permutation matrix. Since (GP)^{T }(GP) = (P^{T}V) D(P^{T}V)^{1}, the eigenvalues are not changed by the permutation, and the matrix of eigenvectors becomes P^{T}V. The regression model remains unchanged. For the tests based on shared haplotype length, changing the ordering of SNPs could mean a shortened shared length, and the association between the disease and the SNPs could disappear. Therefore, the ordering of SNPs contains important genetic information, and ignoring it could lower the power of association tests.
We recently proposed a score test based on wavelet transform [5,6], which is used in this report. This test has three advantages. First, it uses the wavelet transformation of genotype data instead of the genotype data. The wavelet transform is designed to handle unsmooth noisy signals. Genetic data are usually unsmooth and can be dealt with by the wavelet transform naturally. Second, it uses an empirical Bayes thresholding [7]. It was proved by Johnston and Silverman that this thresholding effectively suppresses noise from data [7]. Therefore, it increases the power of the test. The exact amount of noise being reduced depends on the specific problem. Third, because our method views multilocus genotypes of an individual as a discretized function, the spatial ordering of the SNPs on a chromosome is taken into consideration, which increases the power of the test.
Methods
There is a total of 2,062 individuals consisting of 868 cases and 1,194 controls in the North American Rheumatoid Arthritis Consortium (NARAC) data for Genetic Analysis Workshop (GAW) 16. These individuals were genotyped on the 550 k Illumina SNP chip. We analyzed 22 autosomal chromosomes in this report. SNPs satisfying one of the following criterion were excluded: missing genotype rate >0.05, or minor allele frequency < 0.05, or having pvalues < 0.00001 in the HardyWeinberg equilibrium test. About 12.8% of SNPs were removed. Missing genotypes were imputed by fastPHASE [8]. We adopted a strategy of a moving window of eight SNPs without overlapping (the first window contains SNPs 18, the next window contains SNPs 916, etc.) on the chromosome. We have tried windows of sizes 16 and 32 SNPs and there was not much difference. Overlapping windows were also tried and again, the results were similar. Computational burden increases when large windows or overlapping windows were used. The k^{th }window is from the (8(k 1)+ 1)^{th }SNP to the 8k^{th }SNPs. Let G_{k }be the corresponding eight columns of the genotype matrix G. Different ways of coding SNPs do make a difference. The SNPs in G_{k }were recoded to maximize the number of positive pairwise correlations. This removes any ambiguity in the coding of genotypes. In our simulation studies, the recoding increased the power of the test while keeping the type I error rate in check.
We applied EIGENSTRAT [9] to remove the effects of population stratification. The top three eigenvectors obtained by EIGENSTRAT were used to adjust genotypes and phenotypes as suggested by Price et al. [9]. Let G be the genotype matrix, C be the matrix of the principal component vectors, and Y be the vector of phenotypes. The adjusted genotype matrix is = G  CC^{T}G and the adjusted phenotype vector is = Y  CC^{T}Y. Before the adjustment, the inflation factor [10] λ_{GC }= 1.399, which reduced to 1.025 after the adjustment. To avoid a possible bias, all chromosomes except 6 and 8 were included to produce the eigenvectors, which were used to adjust genotypes on chromosomes other than 6. Chromosome 6 was used to produce eigenvectors to adjust genotypes on chromosome 6.
Consider the k^{th }window and the corresponding matrix of adjusted genotype . Applying wavelet transform, thresholding, and inverse wavelet transform consecutively on the i^{th }row of resulted in the modified genotype (x_{i1}, x_{i2}, ..., x_{im}) of the i^{th }individual. Subtract the mean of the j^{th }column of X = (x_{ij}) from x_{ij }such that the mean of each column of X = 0. Let = (y_{1}, y_{2}, ..., y_{n}) be the adjusted phenotype of n individuals. Assume a generalized linear model [11] f(E(y_{i})) = α + X_{i}β between X and Y, where f is a link function. Let be the score statistic for the markers. The variance of U_{j }under the null hypothesis can be estimated by The global score statistic of this window is
The global pvalues of the waveletbased test were obtained as follows. We randomly permuted the phenotypes (casecontrol status) 5,000 times, and received 5,000 sets of permuted phenotypes. Adjust each set with the principal component vectors, and they are still called permuted phenotypes. At each window of eight SNPs, the value of T was calculated 5,001 times, using the adjusted phenotypes and 5,000 sets of permuted phenotypes, respectively. In M windows along the chromosomes, we have 5000 M values of T with permuted phenotypes, denoted by a 5,000 × M matrix (T_{ij}), where T_{ij }is the absolute value of T at the j^{th }window using the i^{th }set of permuted phenotypes. Let m_{i }= max_{j }T_{ij }be the maximum absolute value of T at M windows using the i^{th }set of permuted phenotypes. Let T_{j }be the absolute value of T at the j^{th }window using the adjusted phenotypes. The global pvalue of the waveletbased test at the j^{th }window is the proportion of m_{i }> T_{j}: global pvalue of T_{j }= #{m_{i}m_{i }> T_{j}}/5,000.
Results and discussion
After correcting for population stratification, significant signals were only found on chromosomes 6 and 9. Four windows on chromosome 6 attracted our attention. The first window (rs9268005, rs3130340, rs3115553, rs9268132, rs926070, rs6935269, rs7775397, rs17422797) contains rs3130340, which was identified to have association with bone mineral density and fractures [12]. It has been reported that a large proportion of men with rheumatoid arthritis (RA) had reduced bone mass [13]. The pvalue at this window is 0.0074. The second window (rs4424066, rs12529049, rs3117099, rs3117098, rs3817973, rs1980493, rs2076530, rs4248166) contains rs2076530, which is associated with sarcoidosis [14]. Increased prevalence of sarcoidosis among RA patients has been reported [15]. The association between sarcoidosis, RA, and rs2076530 is an interesting phenomenon. The association between rs2076530 and RA was investigated before, but whether it is a causal SNP for RA or its effect is merely a carryover effect of nearby haplotypes [16] merits further investigation. The third window (rs2395182, rs3129890, rs9268832, rs6903608, rs2395185, rs477515, rs2516049, rs2858870) contains HLADRB1, which has long been identified as a major genetic risk contributor to RA [17]. The last window (rs9275224, rs5000634, rs6457617, rs2647012, rs9357152, rs10484561, rs9275313, rs1794282) contains rs6457617, which has also been reported as being associated with RA [18]. The pvalues of the test at the above three windows are less than 0.0002 (it is 0 after 5,000 permutations). The most significant window on chromosome 9 (rs1953126, rs10985073, rs3761847, rs10985095, rs10985097, rs2900180, rs12235400, rs10985112) which contains rs3761847 and it was identified as a risk locus for RA [19], has a pvalue 0.53.
We applied the waveletbased test on a moving window of eight SNPs with overlapping (the first window contains SNPs 18, the second window contains SNPs 29, etc.) on a 550kb region of chromosome 6 for fine mapping. The results are shown in Figure 1. We have tried other window sizes (16 and 32) and found similar results (not shown). Comparisons of the waveletbased test using overlapping and nonoverlapping windows showed no significant differences. In Figure 2, we compared the waveletbased test with Armitage χ^{2 }test. If pvalue of the waveletbased test is 0 after 5,000 permutations, we set the pvalue as 0.0002 in order to calculate log(pvalue) in Figures 1 and 2, which makes 0.0002 the smallest possible pvalue in this study. Some significant loci (HLADRB1, rs2076530, rs6457617) were identified by both tests. However, the Armitage χ^{2 }test was not significant around rs3130340 (pvalue after Bonferroni correction was 1), while the pvalue of the waveletbased test at the window containing the SNP was 0.0074, after correcting for multiple testing.
Figure 1. pValues on 22 chromosomes, and comparison of windows. The plot on the left contains the pvalues of the waveletbased test on 22 chromosomes. The horizontal line indicates 5% significance level. The plots on the right are a comparison of overlapping windows (top) and nonoverlapping windows (bottom) for a 550kb region on chromosome 6.
Figure 2. Comparison with single marker test. Comparisons of the waveletbased test and the Armitage χ^{2 }test on chromosome 6. The plots on the left are for the whole chromosome 6, and the plots on the right are fine mapping results for a 550kb region of chromosome 6. The triangles indicate the positions (from left to right) of rs3130340 (associated with bone mineral density and RA), rs2076530 (associated with sarcoidosis and RA), HLADRB1 (Note: no data from 32.54 Mb to 32.67 Mb), and rs6457617 (associated with RA).
Conclusion
A waveletbased multilocus score test was applied in a genomewide association study on RA data followed by fine mapping of regions identified in our genomewide association study. Several statistically significant risk loci for RA were identified after adjustment for population stratification. Some windows contain genes and/or SNPs (HLADRB1, rs3761847, and rs6457617) previously associated with RA. Some windows contain SNPs (rs2076530, rs3130340) previously associated with sarcoidosis [14] or bone mineral density and fractures [12], all of which have been reported as being related with RA [13,15].
List of abbreviations used
NARAC: North American Rheumatoid Arthritis Consortium; PCReg: Principal components analysis; RA: Rheumatoid arthritis; SNP: Singlenucleotide polymorphism
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
RJ and JD both contributed in development of the statistical test, provided simulation strategies, and drafted the manuscript. RJ also participated and guided the numerical calculations. YD carried out part of the programming work. All authors read and approved the manuscript.
Acknowledgements
This research was partially supported by a National Institutes of Health grant GM06994001A2. The authors thank the reviewers for their helpful suggestions which greatly improved the paper.
This article has been published as part of BMC Proceedings Volume 3 Supplement 7, 2009: Genetic Analysis Workshop 16. The full contents of the supplement are available online at http://www.biomedcentral.com/17536561/3?issue=S7.
References

Tzeng JY, Devlin B, Roeder K, Wasserman I: On the identification of disease mutations by the analysis of haplotype similarity and goodness of fit.
Am J Hum Genet 2003, 72:891902. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Zhang S, Sha Q, Chen HS, Dong J, Jiang R: Transmission/disequilibrium test based on haplotype sharing for tightly linked markers.
Am J Hum Genet 2003, 73:566579. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Wang K, Abbott D: A principal components regression approach to multilocus genetic association studies.
Genet Epidemiol 2008, 32:108118. PubMed Abstract  Publisher Full Text

Wang T, Elston RC: Improved power by use of a weighted score test for linkage disequilibrium mapping.
Am J Hum Genet 2007, 80:353360. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Jiang R, Dong J, Dai Y: A wavelet based method in association [abstract 96].

Improving Power in Geneticassociation Studies Via Wavelet Transformation
BMC Genetics 2009, 10:53. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Johnston IM, Silverman BW: Empirical Bayes selection of wavelet thresholds.
Ann Stat 2005, 33:17001752. Publisher Full Text

Scheet P, Stephens M: A fast and flexible statistical model for largescale population genotype data: applications to inferring missing genotypes and haplotypic phase.
Am J Hum Genet 2006, 78:629644. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D: Principal components analysis corrects for stratification in genomewide association studies.
Nat Genet 2006, 38:904909. PubMed Abstract  Publisher Full Text

Devlin B, Roeder K: Genomic control for association studies.
Biometrics 1999, 55:9971004. PubMed Abstract  Publisher Full Text

Schaid DJ, Rowland CM, Tines DE, Jacbson RM, Poland GA: Score tests for association between traits and haplotypes when linkage is ambiguous.
Am J Hum Genet 2002, 70:425434. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Styrkarsdottir U, Halldorsson BV, Gretarsdottir S, Gudbjartsson DF, Walters GB, Ingvarsson T, Jonsdottir T, Saemundsdottir J, Center JR, Nguyen TV, Bagger Y, Gulcher JR, Eisman JA, Christiansen C, Sigurdsson G, Kong A, Thorsteinsdottir U, Stefansson K: Multiple genetic loci for bone mineral density and fractures.
N Engl J Med 2008, 358:23552365. PubMed Abstract  Publisher Full Text

Tengstrand B, Hafstrom I: Bone mineral density in men with rheumatoid arthritis is associated with erosive disease and sulfasalazine treatment but not with sex hormones.
J Rheumatol 2002, 29:22992305. PubMed Abstract  Publisher Full Text

Valentonyte R, Hampe J, Huse K, Rosenstiel P, Albrecht M, Stenzel A, Nagy M, Gaede KI, Franke A, Haesler R, Koch A, Lengauer T, Seegert D, Reiling N, Ehlers S, Schwinger E, Platzer M, Krawczak M, MullerQuernheim J, Schurmann M, Schreiber S: Sarcoidosis is associated with a truncating spliice site mutation in BTNL2.
Nat Genet 2005, 37:357364. PubMed Abstract  Publisher Full Text

Kucera RF: A possible association of rheumatoid arthritis and sarcoidosis.
Chest 1989, 95:604606. PubMed Abstract  Publisher Full Text

Orozco G, Eerligh P, Sánchez E, Zhernakova S, Roep BO, GonzálezGay MA, LópezNevot MA, Callejas JL, Hidalgo C, PascualSalcedo D, Balsa A, GonzálezEscribano MF, Koeleman BP, Martín J: Analysis of a functional BTNL2 polymorphism in type 1 diabetes, rheumatoid arthritis, and systemic lupus erythematosus.
Hum Immunol 2005, 66:12351241. PubMed Abstract  Publisher Full Text

Stastny P: Association of the Bcell alloantigen DRw4 with rheumatoid arthritis.
N Engl J Med 1978, 298:869871. PubMed Abstract

Julià A, Ballina J, Cañete JD, Balsa A, TorneroMolina J, Naranjo A, AlperiLópez M, Erra A, PascualSalcedo D, Barceló P, Camps J, Marsal S: Genomewide association study rheumatoid arthritis in the Spanish population: KLF12 as a risk locus for rheumatoid arthritis susceptibility.
Arthritis Rheum 2008, 58:22752286. PubMed Abstract  Publisher Full Text

Plenge RM, Seielstad M, Padyukov L, Lee AT, Remmers EF, Ding B, Liew A, Khalili H, Chandrasekaran A, Davies LR, Li W, Tan AK, Bonnard C, Ong RT, Thalamuthu A, Pettersson S, Liu C, Tian C, Chen WV, Carulli JP, Beckman EM, Altshuler D, Alfredsson L, Criswell LA, Amos CI, Seldin MF, Kastner DL, Klareskog L, Gregersen PK: TRAF1C5 as a risk locus for rheumatoid arthritisa genomewide study.
N Engl J Med 2007, 357:11991209. PubMed Abstract  Publisher Full Text  PubMed Central Full Text