Abstract
Background
Epistasis is recognized as a fundamental part of the genetic architecture of individuals. Several computational approaches have been developed to model genegene interactions in casecontrol studies, however, none of them is suitable for timedependent analysis. Herein we introduce the Survival Dimensionality Reduction (SDR) algorithm, a nonparametric method specifically designed to detect epistasis in lifetime datasets.
Results
The algorithm requires neither specification about the underlying survival distribution nor about the underlying interaction model and proved satisfactorily powerful to detect a set of causative genes in synthetic epistatic lifetime datasets with a limited number of samples and high degree of rightcensorship (up to 70%). The SDR method was then applied to a series of 386 Dutch patients with active rheumatoid arthritis that were treated with antiTNF biological agents. Among a set of 39 candidate genes, none of which showed a detectable marginal effect on antiTNF responses, the SDR algorithm did find that the rs1801274 SNP in the FcγRIIa gene and the rs10954213 SNP in the IRF5 gene nonlinearly interact to predict clinical remission after antiTNF biologicals.
Conclusions
Simulation studies and application in a realworld setting support the capability of the SDR algorithm to model epistatic interactions in candidategenes studies in presence of rightcensored data.
Availability: http://sourceforge.net/projects/sdrproject/ webcite
Background
The complex nature of human disease has long been recognized and, with the exception of a limited number of examples which follow the rules of mendelian inheritance patterns, common disease results from the poorly understood interaction of genetic and environmental factors [1,2]. At the same time, genegene interactions that do not result in linearity between genotype and phenotype (epistasis), may involve several genes at time, dramatically increasing the complexity of the phenomenon. Epistasis can either be defined from a biological point of view as deviations from the simple inheritance patterns observed by Mendel [3] or, from a mathematical point of view, as deviations from additivity in a linear statistical model [4].
The study of statistical epistasis by traditional parametric models is challenging and hindered by several limitations. These include, the problem of the sparseness of data into the multidimensional space [5], the loss of power when adjusting for multiple testing to decrease type I error [6,7], the loss of power in presence of multicollinearity [8] or genetic heterogeneity [1]. To address these issues, several nonparameteric multilocus methods, essentially based on machinelearning techniques, have been developed and/or applied to genetic association studies with positive results [9]. The application of data mining algorithms to detect nonlinear highorder interactions in the context of survival analysis is more complex and thus far limited to a few examples [1012]. However, the effective ability of these algorithms to model genegene interactions and their power to detect epistasis in survival analysis has yet to be determined.
At least two points in modelling nonlinear interactions in survival analysis should be taken into account. The first, is the proper way to handle censored data, that is those cases for whom the outcome has not yet happened at the end of the observation time (survival time) or who did not have the event until the end of study (including lost cases and missing data), which are commonly referred to as rightcensored cases [13]. The second, is the optimal performance measure to be used in assessing a learned model in survival analysis. In this paper we present an extension of the multifactor dimensionality reduction (MDR) algorithm [14,15], to detect and characterize epistatic interactions in the context of survival analysis which was specifically designed to address the abovementioned issues. Censored data were directly handled by estimating individual multilocus cells survival functions by the KaplanMeier method [16]. Multilocus genotypes were then pooled into highrisk and lowrisk groups whose predictive accuracy was evaluated by the Brier score for censored samples proposed by Graf et al [17].
The power of the method we propose was at first evaluated in lifetime simulated datasets with epistatic effects which belonged to the most common survival distributions and with different degrees of rightcensorship. The method was then applied to identification of singlenucleotide polymorphisms (SNPs) associated with responses to antitumor necrosis factor (TNF) agents in patients with rheumatoid arthritis (RA) and active disease.
The notion of pharmacogenetics is not anew in RA and several candidategene studies have demonstrated a geneticallybased individual variability to treatment with methotrexate or antiTNF therapy [1820]. However, there is no consensus at present as to whether pharmacogenomics will allow prediction of antiTNF therapy efficacy in RA. So far pharmacogenomics studies in RA have produced conflicting results and population stratification and linkage disequilibrium have been cited as potential causes for the inability to replicate results of genetic association studies [21]. Yet, as demonstrated by Greene et al [22] when main effects fail to replicate, genegene interaction analysis should also be considered as a potential source of variance.
Methods
Description of the survival dimensionality reduction (SDR) algorithm
The core of the SDR algorithm is the classification procedure used to label as "highrisk" or "lowrisk" the multilocus cells that result from genegene interaction. This procedure will be used both for feature selection and for model validation as described in the forthcoming sections.
SDR assignments and evaluation
The SDR procedure for classification is illustrated in Figure 1 and it involves 5 steps.
Figure 1. Survival Dimensionality Reduction (SDR) process. Step 1, survival estimates are calculated in the whole population by the KaplanMeier method. Step 2, survival estimates are computed in each multidimensional cell. Step 3, timepoint differences in survival estimates between the whole population and each multidimensional cell are calculated. Step 4, timepoint differences are normalized to 1 to take into account negative values and then averaged via the geometric mean (GM). Step 5, individual data from multidimensional cells with GM ≤ 1 ("highrisk") are pooled and compared via log ranktest statistic, to individual data from multidimensional cells with GM > 1 ("lowrisk").
Step 1: We firstly calculate by the KaplanMeier method [16] the survival estimates Ŝ(t) for the whole population in a dataset or dataset partition:
where, n_{i }is the number of cases "at risk" of an event prior to t_{i}, and δ_{i}, is the number of events at time t_{i}.
Step 2: We then select n discrete variables from the dataset and represent all the possible multidimensional cells resulting from their interaction. For each multidimensional cell, survival estimates at each time interval Ŝ_{c}(t_{i}) are calculated as described above.
Step 3: The difference D_{c}(t_{i}) between the multilocus cell survival estimates and the whole population survival estimates, is calculated for each time interval t_{i}:
Step 4: All the D_{c}(t_{i}) for each multilocus cells are then averaged. As the productlimit estimator is de facto a geometric progression, D_{c}(t_{i}) are averaged via the geometric mean (GM) rather than via the algebraic mean. As it is impossible to calculate the geometric with zero or negative data point values, these should be transformed to a meaningful equivalent positive number; being 1 < D_{c}(t_{i}) < 1, transformation is made adding 1 to any D_{c}(t_{i}) value. Considering a finite number n of time intervals and denoting t_{n }as the survival time at the n^{th }timeinterval we thus have:
Step 5: Cells with GM_{c}(t_{n}) ≤ 1 are classified as "highrisk" and cells with GM_{c}(t_{n}) > 1 are classified as "lowrisk". Examples from highrisk cells are pooled into one group and those from lowrisk cells into another.
Once dimensionality has been reduced to one dimension, SDR predictions may be evaluated via different fitness measure. Herein, we employed the Brier score for censored data. The Brier score is a metric widely used for predicting the inaccuracy of a model and in its modified version proposed by Graf et al [17], it can incorporate censored samples. The Brier score BS(t) for censored samples for a given t > 0, is defined as:
where Ĝ(t) denotes the KaplanMeier estimate of the censoring distribution G(t) which is based on the observations (t_{i}, 1  δ_{i}) and I stands for the indicator function. BS(t) depends on time t, hence it makes sense to use the integrated Brier score (IBS) as an overall measure for the prediction of the model at all times:
The lower the IBS the less inaccurate or, conversely, the more precise the prediction is.
Feature selection and model validation (kfold CrossValidation)
From a lifetime dataset, relevant features are extracted and validated via the kfold crossvalidation method, as described by Ritchie and coworkers [14].
For feature selection, the dataset is equally partitioned in k mutuallyexclusive testing sets; one k set is retained for model validation whilst the remaining k1 parts of the dataset are used as a training set. This process is then repeated ktimes with each of the k testing samples never included in the feature selection process. In every training set, all the possible combinations of nvariables and the multilocus cells that result from their interactions are represented into the multidimensional space. SDR models are then iteratively built for each combination of nvariables and training IBS scores are calculated. For each ncombination of variables, the k training IBS are then averaged and the ncombination yielding the lowest mean IBS is selected and considered for model validation.
In the model validation phase, SDR assignments for the best ncombination of variables are determined in every training set. On the basis of training assignments, the instances from the corresponding k testing sets are labelled as "highrisk" or "lowrisk". From these labels, a crossvalidated IBS is then calculated; for this purpose, instead of mathematically averaging the k testing IBS, we merged the testing instances in a metaanalysisbased on individual patient datafashion [23]. Let T_{1}, T_{2,  }T_{k }be the k testing sets, the individual patients data together by their assigned labels are merged sort to produce a larger T_{M }testing set. The IBS (henceforth labelled as metaIBS) is computed in the T_{M }set and the ncombination yielding the lowest metaIBS is then chosen as the final model. This way both feature selection and model validation are used to determine the best epistatic model in lifetime datasets. Working on the merged dataset (T_{M}), we: 1) still ensure the independence of the testing sets, as testing assignments are calculated during the feature selection phase; 2) reduce the bias in the calculation of the BS(t), as the Ĝ(t) and G(t) weights used in the BS(t) formula would otherwise be unreliably estimated in testing sets of limited size; 3) avoid to solely rely on a measure of central tendency (e.g. the mean) to estimate the predictive accuracy of our model, utterly ignoring any measure of variance.
Data simulation and power calculation
Simulated epistatic datasets were modelled upon five different survival distributions, described by the logisticexponential equation [24]; exponential (EXP), bathtubshaped failure rate (BT), upsidedown bathtubshaped failure rate (UBT), decreasing failure rate (DFR) and increasing failure rate (IFR):
where, S(t) is the logisticexponential survival distribution, t is the survival time, λ is a positive scale parameter and κ is a positive shape parameter and θ is a ≥ 0 parameter that shifts the distribution to the left. λ, κ and θ were adjusted so that the cumulative prevalence of the event at the end of the observation time t_{n }or K(t_{n}), was equal to an arbitrary value of 0.750 (see Additional file 1, Table S1). For data simulation t_{n }was set to 5 time units.
Additional file 1. Epistatic models and simulation specifics. The file contains the settings used to generate the five survival distributions upon which epistatic models were modelled. For each epistatic model, ttimepoint and cumulative multilocus genotype penetrances are reported, along with the timepoint and the cumulative broadsense heritability and prevalence of the event.
Format: DOC Size: 669KB Download file
This file can be viewed with: Microsoft Word Viewer
According to Culverhouse et al [25], we then generated different epistatic models for two biallelic SNPs A and B, both in HardyWeinberg equilibrium (HWE) and with minor allele frequency (MAF) = 0.2, so that K(t_{n}) = K_{A }= K_{B}, where K_{A }and K_{B }are the marginal penetrances for SNP A and SNP B. As these models were adjusted to fit the cumulative prevalence at t_{n}, their broadsense heritability (H^{2}) was considered to be a cumulative estimate of H^{2 }at t_{n }or H^{2}(t_{n}). For data simulation H^{2}(t_{n}) was set to 0.10, 0.15, 0.20 or 0.25.
Timepoint multilocus genotype penetrances henceforth labelled as (Genotype)t_{I }, were adjusted so that the cumulative prevalence at each timepoint K(t_{i}) would fit the survival distribution. We assumed that the relation between SNP A and SNP B is purely epistatic at each timeinterval t_{i}, that is K(t_{i}) = K_{A}(t_{i}) = K_{B}(t_{i}) with 0 < t_{i }≤ t_{n}, and that the twolocus model is always proportional at the different t_{i}. Hence, applying the productlimit estimation, we obtain:
where Kt_{i }is the timepoint prevalence at t_{i }and; t_{i }≤ t_{n}.
From Kt_{i }we can calculate (Genotype)t_{i }from the cumulative multilocus genotype penetrances [Genotype(t_{n})] previously used to compute K(t_{n}) and H^{2}(t_{n}):
Similarly, timepoint cumulative estimated multilocus genotype penetrances [Genotype(t_{i})] are proportional to K(t_{i}). These penetrances can be used to derive the timepoint cumulative estimated H^{2 }or H^{2}(t_{i}).
Once (Genotype)t_{i }values had been calculated, a population of 65000 individuals was built considering all the timeintervals 0 < t_{i }≤ t_{n}. Herein, from 5 survival distributions with 3 different H^{2}(t_{n}) and two t_{n }we obtained 40 populations where the outcome was related to the epistatic interaction between SNP A and SNP B (see Additional file 1). To each of these populations 13 unrelated SNPs in HWE, with MAF ranging from 0.1 to 0.5 were added. An additional 5% censoring/year was also added to account for hypothetical nonevent related causes of withdrawal from observation.
From the simulated populations we finally randomly draw 100 samples of 200 cases and 200 controls (e.g. 50% censorship) or 120 cases and 280 controls (e.g. 70% censorship). A total of 4,000 datasets were then generated for simulation. Power was estimated as the number of times SDR correctly identified the two functional SNPs out of 100 datasets/model/degreeofcensorship. Datasets can be obtained upon request from the authors.
Application of the SDR algorithm to the RA dataset
The SDR algorithm was then tested in a realworld dataset which consists of previously unpublished data about 386 Dutch patients with (1) a diagnosis of RA according to ACR criteria [26], (2) a disease activity score (DAS28) >3.2 [27] and (3) previous treatment with at least two other antirheumatics including methotrexate (MTX) at an optimal dose (maximum dose of 25 mg/week) or intolerance for MTX, that underwent treatment with antiTNFα agents. These patients were extracted from the Dutch Rheumatoid Arthritis Monitoring (DREAM) registry [28] and genotyped for 39 candidate SNPs and evaluated every 3 months to ascertain whether they had reached a clinical remission, defined as a DAS28 ≤ 2.6 [27]. Genotyping details can be found in: Pavy et al [29], Coenen et al [30], Toonen et al [31] and Alizadeh et al [32]. The choice of the studied SNPs was motivated by results from previous association and/or pharmacogenomics studies in RA [1820,33]. A detailed list of the analysed genetic variants is provided in Additional file 2, Table S1. The knearestneighbour method was used to impute genotypes with missing data <10% [34]. The opensource Orange data mining software (available at: http://www.ailab.si/orange webcite) was used for imputation. Overall the rightcensorship of this dataset was 68%; 5fold crossvalidation was used for the SDR analysis. An empirical Pvalue for the SDR results was calculated by performing 100fold permutation testing [35]. A whole SDR analysis, up to the 3^{rd }dimension was conducted in the permutated datasets during the permutation procedure.
Additional file 2. Genotype frequencies and univariate analysis in the rheumatoid arthritis (RA) dataset. The file lists in tabular form the single nucleotide polymorphisms (SNPs) included in the RA casecontrol study. It also reports the associations by the logrank test statistics under the dominant and recessive model between the studied SNPs and the occurrence of clinical remission after therapy with antiTNF agents.
Format: DOC Size: 493KB Download file
This file can be viewed with: Microsoft Word Viewer
For all the analyses a modified version of the freely available SDR algorithm written in Python http://sourceforge.net/projects/sdrproject/ webcite was used.
Results
Simulated datasets
The penetrance functions for the simulated datasets are reported in Additional file 1; as it can be observed, timepoint H^{2 }across the different models were consistently low, with a median of 0.017 (interquartile range [IQR]: 0,011  0.025). Power for the SDR algorithm to correctly identify the causative pair of SNPs in the 20 simulated survival models with 2 different degrees of censorship is reported in Table 1. Overall, the median power across all the datasets was 87.5% (IQR, 62.25%  95%). The relationship between H^{2}(t_{n}) and power followed a direct logistic distribution as shown in Figure 2, (overall, R^{2 }= 0.846; 50% censoring, R^{2 }= 0.886; 70% censoring, R^{2 }= 0.870). Moreover, the power resulted to be independent from the survival distribution the sample was withdrawn from whilst it was inversely related to the degree of censorship. The H^{2}(t_{n}) values we employed correspond to mildtomoderate hazard ratios (HR) for highrisk vs lowrisk combinations in the different epistatic models: HR = 1.38 for H^{2}(t_{n}) = 0.10, HR = 1.5 for H^{2}(t_{n}) = 0.15, HR = 1.58 for H^{2}(t_{n}) = 0.20 and HR = 1.69 for H^{2}(t_{n}) = 0.25. Altogether these results suggest that SDR has a satisfactory power to identify a pair of causative genes in purely epistatic lifetime models with mildtomoderate effect size and for which the proportionality of hazards holds.
Table 1. Power for the Survival Dimensionality Reduction (SDR) algorithm in models with cumulative prevalence K(t_{n}) = 0.750
Figure 2. Trend of the power for the survival dimensionality reduction algorithm in the different models. Relationship between power and cumulative broadsense heritability (H^{2}) according to the different degrees of censorship of the sample datasets (squares = 70%; triangles = 50%). The line represents the power estimated by the logistic function (R^{2 }= 0.846).
RA dataset
The RA datasets comprises 386 antiTNFαtreated patients, aging 45 ± 13.7 (mean ± standard deviation) at the onset of disease; 78.3% of patients tested positive for the rheumatoid factor and 262 (67.9%) were females. Onehundredfortyfive patients (37.6%) were treated with adalimumab, 201 (52.1%) with infliximab and 40 (10.4%) with etanercept; overall 346 patients (89.6%) were treated with antiTNF antibodies (e.g. adalimumab and infliximab). DAS28 at the beginning of therapy was 5.64 ± 1.09. Clinical remission, based on DAS28, was observed in 123 cases (31.8%).
Details about the genotyped SNPs along with their frequencies in the studied population are reported in Additional file 2, Table S1; all the SNPs were in HWE. None of the single SNPs showed a statistically significant association with response to antiTNF agents, either under a dominant or recessive model, as illustrated in Additional file 2, Table S2 (logrankassociated p values with 1 degree of freedom, corrected for the number of comparison by Bonferroni adjustment >0.05).
The SDR algorithm sorted out twoway interaction model, involving the rs1801274 (Fc gamma receptor 2a, FcγRIIa) and the rs10954213 (interferon regulatory factor 5, IRF5) SNPs, as the most predictive for responses to antiTNF therapy in patients with active RA. Table 2 shows the full analysis conducted by the SDR algorithm in the RA datasets. As expected, the model overfits in the training population as the number of SNPs included in the model increases. Yet, crossvalidation prevented overfitting as the minimum metaIBS was observed for the 2way interaction, that was thus chosen as the best epistatic model. This model was significant at the 0.05 threshold after 100fold permutation testing. Figure 3a summarizes the multilocus cells for the rs1801274 × rs10954213 interaction along with SDR "high" and "lowrisk" assignments (e.g. "responders" and "not responders" to therapy). This interaction had the typical nonlinear behaviour of epistatic model. Plotting this SDR assignments we can observe that patients labelled as "responders" achieved earlier and higher rates of clinical remission after antiTNF therapy compared to patients labelled as "nonresponders" (figure 3b) [36].
Table 2. Survival dimensionality reduction (SDR) model for the rheumatoid arthritis (RA) dataset.
Figure 3. Survival dimensionality reduction (SDR) model for the rheumatoid arthritis dataset. a. Multidimensional matrix for the single nucleotide polymorphisms (SNPs) interaction; GM, geometric mean of the differences; n, number of cases; e, percentage of events. b. KaplanMeier estimated survival patterns associated with "high" and "lowrisk" assignments. P values are those associated with a chisquare distribution with 1 degree of freedom.
Discussion
In the present paper we introduce SDR, an algorithm specifically conceived to detect nonlinear genegene interactions in presence of rightcensored data. The need for such a bioinformatics tool comes from the observation that several studies in the medical field deal with the loss of data during the period of study, and that methods that do not take into account censored data give upwardly biased estimates of failure and/or suffer from information loss due to reduced sample size. Cox regression [37], the most popular statistical technique used to analyse timetoevent multivariate model, is not adequate to detect nonlinearities either. Indeed, to properly model interaction in Cox regression, the user should have a priori knowledge of the variable relationships and may need to enter nonlinear transforms of the predictors, but this is often a trial and error approach. Also, the number of polynomial terms needed to model complex interactions is inflated as the number of predictors grows, increasing standard errors and, thus, type I error [6,7].
Using simulated epistatic lifetime datasets, we demonstrated that the SDR algorithm retains a fully satisfactory power to sort out a set of causative genes with mildtomoderate epistatic effect size from a pool of candidate genes. These results have been accomplished by the intrinsic properties of the SDR methodology. Firstly, SDR is nonparametric, in the sense that it is not necessary to make a priori assumptions about the underlying interaction model. Also, SDR requires no assumptions concerning the nature or shape of the underlying survival distribution. Secondly, SDR performs well even in datasets where the rightcensorship rate is high (up to 70% of cases in our simulation). Notably, when we run MDR ignoring censorship on the same simulated datasets we observed a dramatic reduction in the power to detect the causative pair of genes, that ranged from 5% to more than 90% (results not shown). Beside power, an additional advantage of SDR is that, combining crossvalidation with permutation testing, the chance of falsepositive findings is minimized [14]. Yet, to generate the simulated datasets we required the hazards among multilocus cells to be proportional along the lifetime distribution. Hence, further simulations are needed to ascertain whether SDR is suitable for situations where this assumption does not hold, as for instance in case of additive hazard models. Similarly, further simulations are needed to establish SDR performance in larger datasets, in presence of genetic heterogeneity, linkage disequilibrium, or different ranges of MAF.
SDR is, de facto, an extension of the MDR algorithm optimized to analyse lifetime distributions, hence it suffers from similar limitations [14] and it shares some peculiarities with the latter. Namely, the power of SDR is influenced by the epistatic effect size, which is strictly related to the (cumulative) heritability of the model [38]. Moreover, as with MDR, the biological significance of SDR models may be difficult to interpret due to the nonlinear distribution of highrisk and lowrisk cells across the multidimensional space [39]. Finally, SDR cannot make predictions when multilocus cells contain no data and GM estimates may be upwardly or downwardly inflated when multilocus cells contain few data.
Having demonstrated the capability of SDR to detect genegene interactions in lifetime datasets, we applied the algorithm to a population of patients with active RA to identify epistatic interactions that may affect timerelated responses to antiTNF biological agents. We did show that among a set of 39 candidategene loci, none of which had a detectable marginal effect on the outcome variable, the nonlinear interaction between the rs1801274 (FcγRIIa) and the rs10954213 (IRF5) SNPs significantly predicted the responses to antiTNF therapy.
Whilst it is difficult to dissect the biological meaning of statistical epistatic models [39], it should be noted that several lines of evidence support a role for Fcγ receptors and IRF5 in rheumatoid arthritis and TNF driven processes. Associations between IRF5 polymorphisms and RA have been described in different populations [40] and a recent genomewide association study (GWAS) showed that the rs4728142 variant in the IRF5 gene, in tight linkage disequilibrium with rs10954213, is strongly associated with RA susceptibility [41]. Of interest, functional experiments demonstrated that the rs10954213 SNP significantly alters IRF5 mRNA expression [42]. The association between IRF5 gene and RA may be linked, at least in part, to the ability of IRF5 to regulate the secretion of proinflammatory cytokines. Indeed, Takaoka et al [43] using mouse models deficient in the IRF5 gene, showed that IRF5 is generally involved downstream of the tolllike receptor (TLR)MyD88 signalling pathway for gene induction of TNFα and other cytokines relevant to the pathogenesis of RA. Of interest, the use of antiTNF agents was shown to decrease TLRs expression on different cellular types [44,45]. Similarly to IRF5, the FcγRIIa is involved in TNFα production in the rheumatoid synovia, as observed by Clavel and coworkers [46]. The interaction between the genetic variants of IRF5 and Fcγ receptors could thus influence TNFα production and/or availability, affecting the clinical response to antiTNF agents. Additionally, as postulated by Cañete at al [47], polymorphisms of the FcγRIIa may alter the clearance rate of antiTNF antibodies modulating plasma concentrations and consequently their biological effect in subjects with active RA. Theoretically, this effect should not restricted only to antiTNF antibodies, as also antiTNFα receptors (e.g. etanercept) contain a Fc portion of IgG_{1 }capable of binding to FCγ receptors to produce biological effects, such as antibodydependent cellmediated cytotoxicity [48].
Conclusions
Herein we introduced SDR, an innovative algorithm to detect epistasis in lifetime datasets. Simulation studies and application in a realworld setting, demonstrate the capability of SDR to detect non linear genegene interactions in studies aimed at evaluating the effect of candidate genes on timedependent outcomes. Further studies are necessary to evaluate its applicability in largescale datasets as well.
References

ThorntonWells TA, Moore JH, Haines JL: Dissecting trait heterogeneity: a comparison of three clustering methods applied to genotypic data.
BMC Bioinformatics 2006, 7:20421. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Moore JH: The ubiquitous nature of epistasis in determining susceptibility to common human diseases.
Hum. Hered 2003, 56:7382. PubMed Abstract  Publisher Full Text

Bateson W: Mendel's Principles of Heredity. Cambridge, UK: Cambridge University Press; 1909.

Fisher RA: The correlations between relatives on the supposition of Mendelian inheritance.
Transactions of the Royal Society of Edinburgh 1918, 52:399433.

Bellman R: Adaptive Control Processes. Princeton NJ: Princeton University Press; 1961.

Concato J, Feinstein AR, Holford TR: The risk of determining risk with multivariable models.

Benjamini Y, Drai D, Elmer G, Kafkafi N, Golani I: Controlling the false discovery rate in behavior genetics research.
Behav. Brain. Res 2001, 125:279284.
Bodmer W, Bonilla C. Common and rare variants in multifactorial susceptibility to common diseases. Nat. Genet. 2008, 40: 695701
PubMed Abstract  Publisher Full Text 
Moore JH, Williams SM: New strategies for identifying genegene interactions in hypertension.
Ann. Med 2002, 34:8895. PubMed Abstract  Publisher Full Text

Heidema AG, Boer JM, Nagelkerke N, Mariman EC, van der ADL, Feskens EJ: The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases.

Kronek LP, Reddy A: Logical analysis of survival data: prognostic survival models by detecting highdegree interactions in rightcensored data.
Bioinformatics 2008, 24:24853. Publisher Full Text

Ripley BD, Ripley RM: Neural networks as statistical methods in survival analysis. In Artificial Neural Networks: Prospects for Medicine. Edited by Dybowski R, Gant V. Landes Biosciences Publishers; 1998.

Kalbfleisch JD, Prentice RL: The Statistical Analysis of Failure Time Data. New York: Wiley; 1980.

Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF, Moore JH: Multifactordimensionality reduction reveals highorder interactions among estrogenmetabolism genes in sporadic breast cancer.
Am. J. Hum. Genet 2001, 69:13847. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Hahn LW, Ritchie MD, Moore JH: Multifactor dimensionality reduction software for detecting genegene and geneenvironment interactions.
Bioinformatics 2003, 19:37682. PubMed Abstract  Publisher Full Text

Kaplan E, Meier P: Nonparametric estimation from incomplete observations.
J. Am. Stat. Assoc 1958, 53:45781. Publisher Full Text

Graf E, Schmoor C, Sauerbrei W, Schumacher M: Assessment and comparison of prognostic classification schemes for survival data.
Stat Med 1999, 18:252945. PubMed Abstract  Publisher Full Text

Bansard C, Lequerré T, Daveau M, Boyer O, Tron F, Salierm JP, Vittecoqm O, LeLoët X: Can rheumatoid arthritis responsiveness to methotrexate and biologics be predicted?
Rheumatology (Oxford) 2009, 11:10218. Publisher Full Text

Ranganathan P: An update on pharmacogenomics in rheumatoid arthritis with a focus on TNFblocking agents.
Curr. Opin. Mol. Ther 2008, 10:5627. PubMed Abstract

RegoPérez I, FernándezMoreno M, Blanco FJ: Gene polymorphisms and pharmacogenetics in rheumatoid arthritis.
Curr. Genomics 2008, 9:38193. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Hirschhorn JN, Lohmueller K, Byrne E, Hirschhorn KA: A comprehensive review of genetic association studies.
Genet. Med 2002, 4:4561. PubMed Abstract  Publisher Full Text

Greene CS, Penrod NM, Williams SM, Moore JH: Failure to replicate a genetic association may provide important clues about genetic architecture.
PLoS One 2009, 4:e5639. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Simmonds MC, Higgins JP, Stewart LA, Tierney JF, Clarke MJ, Thompson SG: Metaanalysis of individual patient data from randomized trials: a review of methods used in practice.
Clin. Trials 2005, 2:20917. PubMed Abstract  Publisher Full Text

Leemis LM, Lam Y: The LogisticExponential Survival Distribution.
Naval Research Logistics 2008, 55:252264. Publisher Full Text

Culverhouse R, Suarez BK, Lin J, Reich T: A perspective on epistasis: limits of models displaying no main effect.
Am J Hum Genet 2002, 70:46171. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Arnett FC, Edworthy SM, Bloch DA, Mcshane DJ, Fries JF, Cooper NS, Healey LA, Kaplan SR, Liang MH, Luthra HS, Medsger TA, Mitchell DM, Neustadt DH, Pinals RS, Schaller JG, Sharp JT, Wilder RL, Hunder GG: The American Rheumatism Association revised criteria for the classification of rheumatoid arthritis.
Arthritis Rheum 1987, 31:31524.
1988
Publisher Full Text 
Prevoo ML, van't Hof MA, Kuper HH, van Leeuwen MA, van de Putte LB, van Riel PL: Modified disease activity scores that include twentyeightjoint counts. Development and validation in a prospective longitudinal study of patients with rheumatoid arthritis.
Arthritis Rheum 1995, 38:448. PubMed Abstract  Publisher Full Text

Kievit W, Fransen J, Oerlemans AJ, Kuper HH, van der Laar MA, de Rooij DJ, De Gendt CM, Ronday KH, Jansen TL, van Oijen PC, Brus HL, Adang EM, van Riel PL: The efficacy of antiTNF in rheumatoid arthritis, a comparison between randomised controlled trials and clinical practice.
Ann. Rheum. Dis 2007, 66:14738. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Pavy S, Toonen EJ, MiceliRichard C, Barrera P, van Riel PL, Criswell LA, Mariette X, Coenen M: TNF alpha 308 G > A polymorphism is not associated with response to TNFalphablockers in Caucasian patients with rheumatoid arthritis: systematic review and metaanalysis.
Ann. Rheum. Dis 2009, in press. PubMed Abstract  Publisher Full Text

Coenen MJ, Trynka G, Heskamp S, Franke B, van Diemen CC, Smolonska J, van Leeuwen M, Brouwer E, Boezen MH, Postma DS, Platteel M, Zanen P, Lammers JW, Groen HJ, Mali WP, Mulder CJ, Tack GJ, Verbeek WH, Wolters VM, Houwen RH, Mearin ML, van Heel DA, Radstake TR, van Riel PL, Wijmenga C, Barrera P, Zhernakova A: Common and different genetic background for rheumatoid arthritis and celiac disease.
Hum. Mol. Genet 2009, 18:4195203. PubMed Abstract  Publisher Full Text

Toonen EJ, Coenen MJ, Kievit W, Fransen J, Eijsbouts AM, Scheffer H, Radstake TR, Creemers MC, de Rooij DJ, van Riel PL, Franke B, Barrera P: The tumour necrosis factor receptor superfamily member 1b 676T > G polymorphism in relation to response to infliximab and adalimumab treatment and disease severity in rheumatoid arthritis.
Ann. Rheum. Dis 2008, 67:11747. PubMed Abstract  Publisher Full Text

Alizadeh BZ, Valdigem G, Coenen MJ, Zhernakova A, Franke B, Monsuur A, van Riel PL, Barrera P, Radstake TR, Roep BO, Wijmenga C, Koeleman BP: Association analysis of functional variants of the FcgRIIa and FcgRIIIa genes with type 1 diabetes, celiac disease and rheumatoid arthritis.
Hum. Mol. Genet 2007, 16:25529.
Altman DG, Royston P: What do we mean by validating a prognostic model? Stat. Med. 2000, 19:453473.
PubMed Abstract  Publisher Full Text 
Coenen MJ, Gregersen PK: Rheumatoid arthritis: a view of the current genetic landscape.
Genes Immun 2009, 10:10111. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB: Missing value estimation methods for DNA microarrays.
Bioinformatics 2001, 17:520525. PubMed Abstract  Publisher Full Text

Good P: Permutation Tests: a Practical Guide to Resampling Methods for Testing Hypotheses. Springer, New York; 2000.

Peto R, Peto J: Asymptotically Efficient Rank Invariant Test Procedures. Journal of the Royal Statistical Society.
Series A (General) 1972, 135:185207. Publisher Full Text

Cox DR: Regression Models and Lifetables (with discussion).
Journal of the Royal Statistical Society B 1972, 24:187220.

Edwards TL, Lewis K, Velez DR, Dudek S, Ritchie MD: Exploring the performance of Multifactor Dimensionality Reduction in large scale SNP studies and in the presence of genetic heterogeneity among epistatic disease models.
Hum Hered 2009, 67:18392. PubMed Abstract  Publisher Full Text

Moore JH, Williams SM: Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis.
BioEssays 2005, 27:637646. PubMed Abstract  Publisher Full Text

Han SW, Lee WK, Kwon KT, Lee BK, Nam EJ, Kim GW: Association of polymorphisms in interferon regulatory factor 5 gene with rheumatoid arthritis: a metaanalysis.
J. Rheumatol 2009, 36:6937. PubMed Abstract  Publisher Full Text

Stahl EA, Raychaudhuri S, Remmers EF, Xie G, Eyre S, Thomson BP, Li Y, Kurreeman FA, Zhernakova A, Hinks A, Guiducci C, Chen R, Alfredsson L, Amos CI, Ardlie KG, BIRAC Consortium, Barton A, Bowes J, Brouwer E, Burtt NP, Catanese JJ, Coblyn J, Coenen MJ, Costenbader KH, Criswell LA, Crusius JB, Cui J, de Bakker PI, De Jager PL, Ding B, Emery P, Flynn E, Harrison P, Hocking LJ, Huizinga TW, Kastner DL, Ke X, Lee AT, Liu X, Martin P, Morgan AW, Padyukov L, Posthumus MD, Radstake TR, Reid DM, Seielstad M, Seldin MF, Shadick NA, Steer S, Tak PP, Thomson W, van der Helmvan Mil AH, van der HorstBruinsma IE, van der Schoot CE, van Riel PL, Weinblatt ME, Wilson AG, Wolbink GJ, Wordsworth BP, YEAR Consortium, Wijmenga C, Karlson EW, Toes RE, de Vries N, Begovich AB, Worthington J, Siminovitch KA, Gregersen PK, Klareskog L, Plenge RM: Genomewide association study metaanalysis identifies seven new rheumatoid arthritis risk loci.
Nat Genet 2010, 42:50814. PubMed Abstract  Publisher Full Text

Graham RR, Kyogoku C, Sigurdsson S, Vlasova IA, Davies LR, Baechler EC, Plenge RM, Koeuth T, Ortmann WA, Hom G, Bauer JW, Gillett C, Burtt N, Cunninghame Graham DS, Onofrio R, Petri M, Gunnarsson I, Svenungsson E, Rönnblom L, Nordmark G, Gregersen PK, Moser K, Gaffney PM, Criswell LA, Vyse TJ, Syvänen AC, Bohjanen PR, Daly MJ, Behrens TW, Altshuler D: Three functional variants of IFN regulatory factor 5 (IRF5) define risk and protective haplotypes for human lupus.
Proc Natl Acad Sci USA 2007, 104:675863. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Takaoka A, Yanai H, Kondo S, Duncan G, Negishi H, Mizutani T, Kano S, Honda K, Ohba Y, Mak TW, Taniguchi T: Integral role of IRF5 in the gene induction programme activated by Tolllike receptors.
Nature 2005, 10:2439. Publisher Full Text

De Rycke L, Vandooren B, Kruithof E, De Keyser F, Veys EM, Baeten D: Tumor necrosis factor alpha blockade treatment downmodulates the increased systemic and local expression of Tolllike receptor 2 and Tolllike receptor 4 in spondylarthropathy.
Arthritis Rheum 2005, 52:214658. PubMed Abstract  Publisher Full Text

Netea MG, Radstake T, Joosten LA, van der Meer JW, Barrera P, Kullberg BJ: Salmonella septicemia in rheumatoid arthritis patients receiving antitumor necrosis factor therapy: association with decreased interferongamma production and Tolllike receptor 4 expression.
Arthritis Rheum 2003, 48:18537. PubMed Abstract  Publisher Full Text

Clavel C, Nogueira L, Laurent L, Iobagiu C, Vincent C, Sebbag M, Serre G: Induction of macrophage secretion of tumor necrosis factor alpha through Fcgamma receptor IIa engagement by rheumatoid arthritisspecific autoantibodies to citrullinated proteins complexed with fibrinogen.
Arthritis Rheum 2008, 58:67888. PubMed Abstract  Publisher Full Text

Cañete JD, Suárez B, Hernández MV, Sanmartí R, Rego I, Celis R, Moll C, Pinto JA, Blanco FJ, Lozano F: Influence of variants of Fc gamma receptors IIA and IIIA on the American College of Rheumatology and European League Against Rheumatism responses to antitumour necrosis factor alpha therapy in rheumatoid arthritis.
Ann. Rheum. Dis 2009, 68:154752. PubMed Abstract  Publisher Full Text

Mitoma H, Horiuchi T, Tsukamoto H, Tamimoto Y, Kimoto Y, Uchino A, To K, Harashima S, Hatta N, Harada M: Mechanisms for cytotoxic effects of antitumor necrosis factor agents on transmembrane tumor necrosis factor alphaexpressing cells: comparison among infliximab, etanercept, and adalimumab.
Arthritis Rheum 2008, 58:124857. PubMed Abstract  Publisher Full Text