Abstract
Background
Metaanalysis is a popular methodology in several fields of medical research, including genetic association studies. However, the methods used for metaanalysis of association studies that report haplotypes have not been studied in detail. In this work, methods for performing metaanalysis of haplotype association studies are summarized, compared and presented in a unified framework along with an empirical evaluation of the literature.
Results
We present multivariate methods that use summarybased data as well as methods that use binary and count data in a generalized linear mixed model framework (logistic regression, multinomial regression and Poisson regression). The methods presented here avoid the inflation of the type I error rate that could be the result of the traditional approach of comparing a haplotype against the remaining ones, whereas, they can be fitted using standard software. Moreover, formal global tests are presented for assessing the statistical significance of the overall association. Although the methods presented here assume that the haplotypes are directly observed, they can be easily extended to allow for such an uncertainty by weighting the haplotypes by their probability.
Conclusions
An empirical evaluation of the published literature and a comparison against the metaanalyses that use single nucleotide polymorphisms, suggests that the studies reporting metaanalysis of haplotypes contain approximately half of the included studies and produce significant results twice more often. We show that this excess of statistically significant results, stems from the suboptimal method of analysis used and, in approximately half of the cases, the statistical significance is refuted if the data are properly reanalyzed. Illustrative examples of code are given in Stata and it is anticipated that the methods developed in this work will be widely applied in the metaanalysis of haplotype association studies.
Background
The continuously increasing number of published genedisease association studies made imperative the need of collecting and synthesizing the available data [1,2]. The statistical procedure with which data from multiple studies are synthesized is known as metaanalysis [35]. In metaanalysis, a set of original studies is synthesized and the potential heterogeneity is explored using formal statistical methods [3,4,6,7]. In the medical literature, metaanalysis was initially applied in the field of randomized clinical trials [8,9], but nowadays it is considered a valuable tool for the combination of observational studies [10], as well as for genetic association studies for which specialized methodology has been developed [5,1118].
Most of the genetic association studies (and hence the metaanalyses derived from them) are performed using single markers, usually Single Nucleotide Polymorphisms (SNPs). However, the SNP that is under investigation is not always the true susceptibility allele. Instead, it may be a polymorphism which is in Linkage Disequilibrium (LD) with the unknown diseasecausing locus [19]. In such cases, the single marker tests may be underpowered, depending on the degree of LD and the allele frequencies [20]. Haplotypes, which are the combination of closely linked alleles on a chromosome, are therefore important in the study of the genetic basis of diseases and thus, they are extensively used [21,22]. The importance of studying haplotypes ranges from elucidating the exact biological role played by neighbouring aminoacids on the protein structure, to providing information about ancient ancestral chromosome segments that harbour alleles influencing human traits [23]. Moreover, haplotype association methods are considered to be more powerful compared to single marker analyses [24,25], even though this is questioned by some researchers [26].
A major problem in haplotype analyses is that in order for the analysis to be performed we need to reconstruct or infer the haplotypes, usually with an approach based on missing data imputation [2729]. This uncertainty in imputing the haplotypes poses some problems in the analysis [30] that are to be discussed later in this work. Nevertheless, studies that investigate the association of haplotypes with diseases are increasingly being published (Figure 1), with an even more increasing rate after 2003, when the HapMap project was initiated [31]. This exponential increase follows the general pattern of genedisease association studies [1,2,32] and naturally, the obvious extension would be to use metaanalysis in order to increase the power of individual studies and to resolve the reasons of heterogeneity and inconsistency.
Figure 1. A graphical representation of the increasing number of published haplotypeassociation studies. A search was performed in Pubmed using the terms "haplotype" and "association" from 1997 to 2009. Even though the reference list may include review articles, methodological papers or even irrelevant works, the trend is obvious, especially after 2003 when the HapMap project was presented. The search was conducted during December 2009 and thus the count for 2009 may be an underestimate.
This work has two primary goals. First, to perform a detailed literature search and an empirical evaluation of the published studies that report metaanalyses of haplotype associations; and second, to present a concise overview of the statistical methods that could and should be used in such metaanalyses. These two important issues were not previously studied in the literature and the findings are interesting. Even though the methods presented in this work could be derived in a straightforward manner from extending previous works on multivariate metaanalysis [3337], the majority of the published metaanalyses did not use optimal methods for analyzing the data. Moreover, in several circumstances the results of some studies are shown to be severely flawed. The manuscript is organized as follows: Initially, the commonly used methods for haplotype analysis for a single study are reviewed in order to establish notation. Afterwards, the methods of metaanalysis are presented. In particular, we present the standard method of univariate metaanalysis and its limitations, which leads to a more powerful multivariate approach based on summarydata. Accordingly, a general framework based on generalized linear mixed models (GLMMs) is presented and the approaches based on logistic regression, multinomial logistic regression and Poisson regression are discussed. We also discuss continuous traits and details of the implementation of the models. Finally, we present the results of the empirical evaluation of the literature and compare the results reported in these analyses with the ones obtained using the methods developed here.
Methods
Methods for haplotype association
Let's assume we have n biallelic markers that form a haplotype. If the alleles in position m (m = 1, 2... n) are denoted by A^{m }and B^{m }the possible haplotypes would be r = 2^{n}. In a casecontrol study, a crosstabulation of haplotypes by disease status, that ignores the individuals and counts only the total number of haplotypes observed in the analysis, would result in data arranged in the form of a 2 × r contingency table (Table 1). This crosstabulation is somehow simplistic since it assumes a multiplicative (codominant) model of inheritance [38]. However, it is the most commonly reported form of haplotype data and thus, it is more suitable for metaanalysis of published studies as we will discuss later. Assuming a binomial sampling scheme where fixed numbers of cases and controls are sampled independently, we can model the structure of the table using logistic regression methods where the status (case/control) is the dependent variable and the haplotypes are treated as covariates. This corresponds to the socalled "prospective likelihood", the likelihood based on the probability of the disease given the exposure. Thus, we denote by π_{j }= P(y_{j }= 1) the underlying risk (i.e. the probability of being a case) of a person carrying a single copy of the j^{th }haplotype. A reasonable choice would be to consider the most common haplotype (i.e. h_{1}) as the reference category and create r1 dummy variables taking values z_{j }= 1 for haplotype j and 0 otherwise. This model can be formulated as:
Table 1. Crosstabulation of haplotypes by disease status
This model was proposed initially by Wallenstein and coworkers and as we already mentioned, assumes a multiplicative genetic model of inheritance [38]. Moreover, the haplotypes are assumed known quantities, which may not always be the case (see below).
Alternatively, assuming a multinomial sampling scheme where the total sample size is considered fixed, a multinomial logistic regression model would be appropriate, where the different haplotypes would be the dependent variables. This corresponds to the wellknown "retrospective likelihood" (i.e. the likelihood based on the probability of exposure given disease status) applicable in case control studies. In this case, the haplotypes are treated as dependent variables and the case/control status as the predictor in a multinomial (polytomous or polychotomous) logistic regression [39]:
By observing that the linear predictor becomes:
it is easy to understand that the β_{j }coefficients obtained by fitting the model are estimates of the logOdds Ratios (i.e. for comparing h_{j }vs. h_{1}) in equivalence to the respective coefficients of the model in Eq. (1). Obviously, β_{1 }= 0 for identifiability since haplotype j = 1 (i.e. h_{1}) is used as the reference category. The particular model was first used for haplotype analysis by Chen and Kao [40].
Lastly, assuming that the observed counts are realizations of a Poisson random variable, one can fit loglinear models (Poisson regression), where the dependent variable is the counts and thus, the studies, the type of haplotypes and the case/control status are treated as independent variables. Loglinear models are widely used for haplotype analysis, for instance, for detecting LD [41,42] and for haplotypedisease association [43,44]. This model can be formulated in terms of a Poisson regression model in the context of generalized linear models, as:
This is the standard saturated model for describing the 2 × r contingency table of haplotypes by disease. The β_{j}'s are the coefficients that correspond to the haplotype by disease interaction and are equivalent to those obtained by fitting the models in Eq. (1) and (2). It is easily verified that the coefficients α's and β's are identical across the three models. The overall hypothesis for association (β = 0) can be tested by performing a multivariate Wald test using the estimated covariance matrix, cov(β). Then, the test statistic (score) U = β'cov(β)^{1}β, will have asymptotically a χ^{2 }distribution on r1 degrees of freedom. Alternatively, a likelihood ratio test comparing the saturated model against the model with no interaction can be performed. Similar tests can be performed for the models in Eq. (1) and (2).
Whatever the assumed sampling scheme that gave rise to the data of Table 1 may be, it is well known that the results of fitting each one of the three models are nearly identical [45]. For instance, it has been shown that maximum likelihood estimates obtained from the "retrospective" likelihood are the same as those obtained from the "prospective" likelihood [46,47]. The equivalence of logistic regression and Poisson modelling has been also exploited in the past for deriving methods for detecting geneenvironment interactions [48].
The methods discussed above are simple applications of the generalized linear model extending the analysis of single markers to haplotypes and assume that, i) the haplotype risk follows a multiplicative model of inheritance, ii), the haplotype phase is known and, iii) the population is in HardyWeinberg Equilibrium (HWE). The genetic model of inheritance can be handled simply by using in the analysis the socalled haplogenotypes or diplotypes, instead of the genotypes. This is easily performed with all the previously presented methods by using the pairwise combinations of haplotypes (h_{1}h_{1}, h_{1}h_{2 }and so on). In casecontrol association studies, however, with the exception of some cases where direct genotyping of the haplotypes is applicable (i.e. [38]), the haplotypes (and the haplogenotypes) are usually not known, but are inferred from the data using statistical methods for missing data, usually with an EM or EMlike algorithm [2729]. Thus, treating them as known quantities has been shown to be problematic [30]. More advanced methods have been developed in order to account for these limitations, for instance weighting the haplotypes by their probability [49,50]. Score methods based on the prospective likelihood [51] or the retrospective likelihood [52], have also been developed, as well as methods for allowing for geneenvironment interaction [53]. A comparison of methods has shown that the approaches are roughly comparable when the haplotype effect on disease odds follows a multiplicative model. However, for dominant and recessive models, the retrospectivelikelihood method has increased efficiency with respect to the prospective methods [54]. Graphical models have been proposed by Thomas [55] and loglinear models by Baker [56]. Lin and coworkers extended the previously presented methods by including various sampling schemes in a unified framework [57].
Even though a large body of the genetic epidemiology literature is dedicated to such methods, their application in metaanalysis is problematic since in most cases the original data are not available to the analyst. Thus, in the following sections where the methods for metaanalysis are summarized we also assume that the haplotypes are known. An extension when the posterior probabilities of haplotypes are given from the output of the haplotype inference software would then be straightforward.
Methods for metaanalysis of haplotype association
In this section the methods for metaanalysis are presented. Initially we will discuss simple methods using summary data, whereas in the next subsection more advanced methods that use generalized linear models on grouped or Individual Patients Data (IPD) are presented.
Metaanalysis using summarydata
A commonly used approach that is based on traditional methods and uses solely summary data is to consider separately the effect of the j^{th }haplotype against the j1 remaining ones. That is, for each study i (i = 1,2,...,k) we will compute a logOdds Ratio (logOR):
with an asymptotic variance given by:
In this notation, n_{ic}_{0 }an n_{ic}_{1}, are the counts of the remaining haplotypes (excluding haplotype j) for controls and cases of the i^{th }study respectively, given by:
In a standard univariate randomeffects model we assume that the logarithm of the OR of each study i, is distributed normally as:
Thus, the combined logarithm of the Odds Ratio (logOR) would be given by:
The betweenstudies variance (τ^{2}), could be easily computed by the noniterative method of moments proposed by Dersimonian and Laird [58], even though there are several alternatives that use iterative procedures (i.e. Maximum Likelihood (ML) or Restricted Maximum Likelihood (REML) [33]). Apparently, by setting τ^{2 }= 0 in Eq. (9) corresponds to the well known fixedeffects estimator with inverse variance weights.
The particular approach is very easily implemented, intuitive and it can be performed in a standard univariate metaanalysis framework. In the results section we will see that several already published metaanalyses used this method. However, the method has some drawbacks. The most important is that it is prone to an increased type I error rate due to multiple comparisons. Multiple comparisons constitute an important problem in haplotype analysis, especially as the number of haplotypes increases [59,60]. The model implied by Eqs. (5)  (8), is conceptually similar to collapsing the genotypes in a singlemarker analysis, an approach that has been shown to increase the power as well the type I error rate [61]. Thus, the particular approach can be justified, only when there is strong prior knowledge concerning a particular haplotype and this haplotype is the only one that is being tested.
To overcome the multiple comparisons problem, a straightforward alternative would be to extend the model in a multivariate framework modelling simultaneously the logORs derived from comparing haplotypes j = 2,3,...,r against a reference haplotype (j = 1). Following the general framework for multivariate metaanalysis [37,62], we denote by y_{i }the vector containing the r1 different estimates, and by β, the vector of the overall means given by:
These logORs similarly to Eq. (5) will be given by:
with an asymptotic variance given by:
In the multivariate randomeffects metaanalysis, we assume that y_{i }is distributed following a multivariate normal distribution around the true means β, according to the marginal model:
In the above model, we denote by C_{i }the withinstudies covariance matrix:
and by Σ the betweenstudies covariance matrix, given by:
The diagonal elements of C_{i }are the studyspecific estimates of the variance that are assumed known, whereas the offdiagonal elements correspond to the pairwise withinstudies covariances, for instance ρ_{w}_{23}s_{2}_{i}s_{3}_{i}=cov(y_{2}_{i}, y_{3}_{i}). Since the logORs derived for each haplotype are compared against the same reference category, their pairwise covariances will be given [12], by:
We should mention that from standard normal theory it is known that the multivariate test for β = 0, based on β'cov(β)^{1}β, could yield significant results even if all the r1 univariate Wald tests are nonsignificant. Thus, the multivariate test should be performed initially and only if a significant result is found we can proceed by collapsing the haplotypes and perform a standard univariate metaanalysis.
The model can be fitted in any statistical package capable of fitting randomeffects weighted regression models with an arbitrary covariance matrix, such as SAS (using PROC MIXED or PROC NLMIXED), R (using lme) or Stata (using mvmeta). In this work, we used mvmeta which performs inferences based on either Maximum Likelihood (ML) or Restricted Maximum Likelihood (REML), by direct maximization of the approximate likelihood using a NewtonRaphson algorithm [63]. Alternatively, mvmeta can also implement the multivariate version of the DerSimonian and Laird's method of moments [64]. The last option, being noniterative, is very attractive in case of large number of haplotypes and/or large number of studies. A major disadvantage of the methods proposed in this section is the assumptions of normality that are employed and the need for correction when there are rare haplotypes (i.e. adding a pseudocount of 0.5 to the haplotypes with zero counts). These limitations are surpassed by using the methods discussed in the next section.
Metaanalysis using binary data
In this section, methods that use directly the binary nature of the data, within a generalized linear mixed model (GLMM) are presented. These methods are usually termed IPD methods [3337] although in many reallife applications, individual data may not be literally available. Instead, extending the models described for a single study, only summary counts of individuals carrying the respective haplotypes will normally be used.
Logistic regression
Using the prospective likelihood we can extend the logistic regression model of Eq. (1) in order to incorporate study specific effects and perform a stratified analysis (fixed effects metaanalysis). To do so, we need to introduce k1 dummy variables d_{i }(taking values equal to zero or one) with coefficients β_{0}_{i }that are indicators of the studyspecific fixedeffects. Thus, the model is a straightforward extension to the model described previously for metaanalysis of genetic association studies for single nucleotide polymorphisms [16] and is formulated as:
Here, the β_{j }obtained by fitting the model are the overall estimates of the logORs (i.e. for comparing h_{j }vs. h_{1}). An overall test for the association of haplotypes with disease can be performed if we denote by β the vector of the estimated coefficients and by cov(β) its estimated variancecovariance matrix. Then, the test statistic U = β'cov(β)^{1}β will have asymptotically a χ^{2 }distribution (U~χ^{2}_{r}_{1}) [65]. The particular model has been used in several metaanalyses of haplotype association studies [6669] (see in the results section, the empirical evaluation of the literature). This fixed effects model assumes homogeneity of ORs between studies. This assumption can be tested by adding the interaction between the study effect and the haplotypes into the model::
This is the analogue to the Cochran's test for heterogeneity in the univariate metaanalysis. The hypothesis can be tested by performing a multivariate Wald test, where the null hypothesis is:
The test statistic can be constructed analogously to the one used for β. If we denote by γ the vector of the estimated coefficients, by V the estimated variancecovariance matrix and by Rγ = r the vector of the (r1)(k1) linear hypotheses, then the statistic:
will have asymptotically a χ^{2 }distribution [65]
Moreover, the value of W could be used in order to calculate a modified version of the overall inconsistency index I^{2 }[70]:
This measure is quite useful, since it enables us to summarize the overall heterogeneity, instead of having to look at multiple indices of heterogeneity arising from multiple haplotype contrasts.
In order to account for an additive component of heterogeneity and perform a randomeffects logistic regression allowing the haplotype effects to vary between studies, the most suitable way is to introduce a set of studyspecific random coefficients, representing the deviation of study's true effect from the overall mean effect for each haplotype. Thus, the model becomes:
In this model, the random terms β_{i }are distributed as:
where
The between studies variances and covariances have the same interpretation as the ones obtained by the summarydata methods of Eq. (13) and (15).
Multinomial logistic regression
Alternatively, the model may be parameterized assuming a multinomial sampling scheme utilizing the retrospective likelihood. In this case, an extension of the model of Eq. (2), which incorporates fixedstudy effects, would be:
The linear predictor in the above model becomes:
Similar to the model based on prospective likelihood, the variables d_{i }are indicators of the studyspecific fixedeffects. An overall test for the association of haplotypes with disease (β = 0) can be performed similarly to the logistic regression model (U). Introducing the study by disease interaction terms can form a test for homogeneity of ORs across the k studies:
The statistics for heterogeneity (W) as well as the I^{2 }index derived from it are identical to the one presented in Eq. (19)  (21).
A randomeffects extension to the model can be formulated if in the above model, we introduce a haplotypespecific random coefficient β_{ij }(for haplotypes j = 2,3, ...,r), in which case the linear predictor becomes [71]:
and the model is completely specified as a random effects multivariate metaanalysis, with random terms β_{i }distributed similarly as β_{i}~MVN(0,Σ). The interpretation of the variances and covariances of the random terms is identical to the ones presented in Eq. (13). A version of this model has been used previously for metaanalysis of genetic association studies involving single nucleotide polymorphisms [12], but according to the author's knowledge it has never been used for metaanalysis of haplotypes.
Poisson regression
Lastly, we can extend the loglinear model of Eq. (4) in order to perform a fixed effects metaanalysis allowing for the studyspecific effects. The major difference compared to the previous approaches lies in the structure of the loglinear model and the interpretation of the main effects and interactions. Having in mind that we want to model a 2 × r × k contingency table, the appropriate choice would be to include in the model of Eq. (4) the study specific main effects as well as the twoway interactions (study x disease and study x haplotype). Thus, we would have a model containing all the main effects as well as all the twoway interactions, a model known as the "no threefactor interaction model" [45]:
In this model, the coefficients α_{j}, α_{ij}, β_{0}, β_{0}_{j }and β_{j }correspond to the ones obtained by fitting the models in Eq. (17) and Eq. (15). The overall test for the association of the haplotypes with the disease (β = 0), is known in the context of loglinear models as the test of "partial association" [72,73]. The model in Eq. (29), assumes homogeneity of ORs across studies. Thus, in order to test this assumption we need to include additional terms for the threeway interaction (study x disease x haplotype). This is accomplished by fitting the saturated model:
The test with the null hypothesis Η_{0}: γ = 0 (γ_{ij }= 0,i = 2,3,...,k, j = 2,3,...,r) is identical to the ones obtained by fitting the models in Eq. (18) and (27). The threeway interaction model and its interpretation in terms of testing the homogeneity of ORs has been discussed in detail in the past [45,7476]. Loglinear models have been employed in several metaanalyses of haplotype association [77,78] (see in the results section). However, even though not described in detail, it is apparent from the results reported, that in these analyses the loglinear model was not applied in an appropriate manner. Although the authors stated that they performed stratification by study, they probably included only the main effect of the study and not the interaction terms with both haplotypes and disease. As we will see in the results section, when the correct model is applied, the originally drawn conclusions are compromised.
In analogy to models in Eq. (22) and (28), a random coefficient for the disease by haplotype interaction can be applied in order to perform a randomeffects metaanalysis:
with random terms β_{i }distributed similarly as β_{i}~MVN(0,Σ). Similarly to the multinomial logistic regression model, the interpretation of the variances and covariances of the random terms is identical to the ones presented in Eq. (12).
Continuous traits
The methods discussed so far assume we are dealing with a binary trait, usually in a casecontrol setting. However, continuous traits are not uncommon in genetic association studies and these should be easily accommodated using a linear model (linear regression). For instance, denoting by y_{ij }the continuous trait for a person carrying the j^{th }haplotype in the i^{th }study, the model would be:
The homogeneity of haplotype effects across studies can be subsequently checked using a model with a haplotype x study interaction term:
Finally, a random effects model could be formulated using a liner mixed model:
with random terms β_{i }distributed similarly as β_{i}~MVN(0,Σ). Similarly to the previously described models, the interpretation of the variances and covariances of the random terms is identical to the ones presented in Eq. (12). In case where individual data are not available, the above models could be easily fitted using summary data (mean values and standard deviations) per haplotype.
Implementation
The models presented in this section can be easily fitted in Stata using gllamm, or in SAS using PROC NLMIXED. These models are expected to perform better compared to the models presented in the previous section, in case the normality assumption for logORs does not hold. Furthermore, a major advantage of these models is that they can directly be used for pooled metaanalyses performed under large collaborative projects. This is why these models are usually termed Individual Patients Data (IPD) methods [36]. However, a disadvantage is that these methods are computational intensive, especially when the number of haplotypes is large.
A sometimes useful simplification can be made in Eq. (15) if we assume that the betweenstudies variances are equal [34]. In such case by letting τ = τ_{2 }= τ_{3 }= ... = τ_{r}, Σ reduces to:
Another approximation would be to impose a single between studies correlation, but allow for different betweenstudies variances [79]:
In this work however, we chose to use a different approximation that can be obtained if the number of random effects is reduced by decomposing the random terms using factor loadings such as: τ_{2}^{2 }= λ_{2}^{2}τ^{2}, τ_{3}^{2 }= λ_{3}^{2}τ^{2}, ..., τ_{r}^{2 }= λ_{r}^{2}τ^{2}, and letting λ_{2 }= 1 for identification. Thus, the covariance matrix becomes now:
The particular approximation is conceptually similar to the one used previously for the socalled "genetic modelfree approach" for metaanalysis of genetic association studies [14,80], even though the motivation was different. The model imposes a single between studies variance τ^{2 }thus, it is much faster since the factor loadings λ_{j }with j = 3,4,...,r are treated as fixedeffects parameters. By observing also the offdiagonal elements of the covariance matrix in Eq. (37), we can see that the model restricts the betweenstudies correlations (ρ_{Bjj'}) to be equal to ±1 (depending on the sign of λ_{j}λ_{j'}). Nevertheless, the betweenstudies correlations are usually poorly estimated especially when the number of studies is small (<20) and in such cases they are usually estimated to be equal to ±1 [81,82]. Thus, the particular approach seems to be a good compromise between speed and precision and we expect to perform well. Using this approach, the computational complexity as well as the execution time is reduced drastically but the obtained estimates agree up to the fourth decimal place in most of the experiments conducted.
A final comment has to be made concerning the identifiability of the models presented in the previous sections, especially when it comes to the loglinear models which are the ones that contain the largest number of parameters. Concerning the fixed effects methods, the number of parameters of the saturated model of Eq. (30) is equal to 2rk, a number that is equal to the number of observations [45]. For the model of Eq. (29), the number of freely estimated parameters is equal to rk + r + k1, which is obviously smaller than 2rk (since r > 1 and k > 1). The random effects model of Eq. (31) has a total number of parameters equal to rk + r + k1 + r(r1)/2 since we need to estimate additionally r(r1)/2 elements of the covariance matrix (the variances and the covariances of the random effects). Thus, in order for the model to be identifiable we need to ensure that rk + r + k1 + r(r1)/2 ≤ 2rk which is accomplished if k; ≥ 1+r/2. Intuitively, we need a relatively larger number of studies compared to the number of haplotypes. If on the other hand, we fit the model of Eq. (31) using Eq. (35) for restricting the covariances, we only need rk + r + k parameters and when we use Eq. (36) or Eq. (37), we need to estimate rk+2r + k1 parameters, numbers which both are smaller than 2rk. Nevertheless, for practical applications, we will normally use the logistic regression model of Eq. (22) coupled with parameterization of Eq. (37), and thus identifiability issues will never arise in practice.
In Additional file 1, Stata programs for fitting the models developed in this section are presented. The models were fitted using the gllamm module for Stata [83,84]. gllamm uses numerical integration by adaptive quadrature in order to integrate out the latent variables and obtain the marginal loglikelihood. Afterwards, the loglikelihood is maximized by NewtonRaphson using numerical first and second derivatives.
Additional file 1. Stata code for fitting the methods described in the manuscript. The commands should be run within a Stata dofile.
Format: DOC Size: 35KB Download file
This file can be viewed with: Microsoft Word Viewer
Results
We initially performed a literature search for identifying studies that report metaanalyses of haplotype associations. The initial search in PUBMED using the term "haplotype" combined with "metaanalysis" or "collaborative analysis" or "pooled analysis" yielded 282 studies. Of these, 35 studies could have been identified using solely the terms "collaborative analysis" or "pooled analysis" and "haplotype". After careful screening, 207 studies were excluded as irrelevant ones (they were not metaanalyses of haplotypes), 36 studies were excluded for various reasons (family basedstudies, metaanalyses of SNPs with the term "haplotype" appearing in the abstract or haplotype analyses in which the term "metaanalysis" appeared in the abstract etc). Finally, we came up with 39 published papers containing data for 43 associations. Some studies reported different sets of haplotypes from the same gene (Auburn et al, 2008; Zintzaras et al, 2009), haplotypes from different genes (Thakkinstian et al, 2008), or distinct outcomes measured on different subsets of patients (Kavvoura et al, 2007) and thus, they were included twice, whereas from studies that reported different outcomes measured on the same set of individuals we kept only one. There were also some pairs of studies that evaluated the same association and from these we kept only the largest one. 10 out of the 39 published papers could have been identified using solely the terms "collaborative analysis" or "pooled analysis" coupled with the term "haplotype". The 43 studies and their characteristics are presented in Table 2.
Table 2. List of the 43 metaanalyses that were used in the empirical evaluation
The average number of polymorphisms included in the haplotypes was 3.19 (SD = 1.37, median = 3, range from 2 to 7), whereas the sample size was 5,017.81 (SD = 4,703.24, median = 3,004, range from 348 to 23,309). The average number of included studies was 5.14 (SD = 3.06, median = 4, range from 2 to 13). Twenty seven studies (62.79%) were conducted in a collaborative setting, whereas sixteen (37.21%) were performed using data derived from the literature. Twenty seven of the metaanalyses (62.79%) reported significant results and the majority (22 studies, 51.16%) were analysed under the "1 vs. others" approach using standard summary based metaanalysis techniques (with fixed or random effects), 11 studies (25.58%) were analysed by pooling the data inappropriately, 6 studies (13.95%) did not report the method or did not perform pooling at all and 4 analyses (9.30%) were performed using a fixed effects logistic regression model. Only 13 studies (30.23%) reported the complete data that suffice for the analysis to be replicated (Table 2 and 3).
There was only some weak evidence where collaborative metaanalyses contained larger number of studies compared to literaturebased ones (5.67 vs. 4.25), larger sample size (5,651 vs. 3,948) and produced significant results more frequently (66.67% vs. 56.25%). However, these differences did noreach statistical significance (pvalues equal to 0.144, 0.256 and 0.506 respectively). The average number of included polymorphisms was also comparable (3.26 vs. 3.06, pvalue = 0.654). The thirteen metaanalyses that reported complete data, did not differ significantly from the remaining ones in terms of the included studies (4.46 vs. 5.43, pvalue = 0.345), the number of SNPs in the haplotypes (3.08 vs. 3.23, pvalue = 0.735) and the proportion of significant findings (69.23% vs. 60%, pvalue = 0.576). The proportion of collaborative analyses was higher, even though this difference did not reach statistical significance (76.92% vs. 56.57%, pvalue = 0.216). There was however, moderate evidence that the total sample size included in the metaanalyses that reported complete data was smaller compared to the metaanalyses that did not (3,040.31 vs. 5,874.73, pvalue = 0.069). We also compared the particular database against a database of 55 representative metaanalyses of genetic association studies of SNPs that was used previously in several empirical evaluations [8589]. The mean sample size was approximately equal (5,017 vs. 4,829, pvalue = 0.844), but the number of included studies was nearly halved in the metaanalyses of haplotypes (5.14 vs. 10.53, pvalue < 10^{4}), whereas the proportion of metaanalyses with significant results was twice as large (62.8% vs. 27.27%, pvalue = 0.0003).
The thirteen studies that reported the data necessary for the analysis to be replicated were subsequently used in order to apply the methods proposed in this work. We used all the methods described in the methods section except for the simpler approach of comparing 1 vs. the others haplotypes, i.e. Eq.(5). The results are reported in Table 3, where we list the pvalues for the tests for the overall association (β = 0). For the fixed effects IPD methods we additionally report the pvalue of the overall test for the heterogeneity (γ = 0). Concerning the results obtained using the IPD methods, we report only the ones obtained from the logistic regression method of Eq. (22) using the parameterization of Eq. (37) which is easier to be fitted, even though the multinomial logistic regression and the Poisson regression method would yield similar results. As expected, when the heterogeneity is low (in 8 out of the 13 studies), the random effects methods coincide with their fixed effects counterparts. In general, the methods that use summary data yield slightly different estimates for the ORs compared to the methods that use IPD, when there were rare haplotypes (i.e. small counts) or when the total number of subjects was low (data not shown). In 2 out of the 13 studies the estimates for the multivariate Wald tests for the overall association (β = 0) produce marginally different results compared to the univariate ones.
Table 3. The results obtained using the methods described in this work on the 13 studies that reported complete data that suffice for the analysis to be replicated
The subsequent reanalysis and the contrasting with the initial reports yielded some important findings. Concerning the four studies that initially reported no significant association [9093], the methods presented in this work largely support the initial conclusions. Three of the nine studies (33.33%) that reported statistically significant results [94,95] yielded results that are in complete agreement with the initial reports (the metaanalysis of Kavvoura and coworkers reported results for two outcomes and it was counted twice). The most important finding, however, was the observation that 4 out of the 9 studies (44.44%) [78,9698], yielded results that contradict the initial reports. Two additional studies [68,99] produced marginally significant results as judged by the disagreement between the multivariate and univariate Wald tests (Table 3).
The reasons for these discrepancies deserve further investigation. For instance, in the collaborative metaanalysis for the association of CAPN10 haplotypes with Type 2 Diabetes mellitus [97], the authors report a marginally significant OR of 1.09 (1.00, 1.18) for the "121" haplotype and similar results for two haplogenotypes that include this haplotype. Similar results were previously reported in a literaturebased metaanalysis [100]. However, these estimates have been derived using the "1 vs. others" approach, which although more powerful, it is known to suffer from increase type I error rate; thus it seems that these estimates are the result of a multiple testing procedure. For the metaanalysis concerning the association of ITGAV haplotypes with Rheumatoid Arthritis [98], as well as the association of G30/G72 haplotypes with schizophrenia [96], the authors did not explicitly state how the pooling of estimates was performed, but the methods presented in this work suggest clearly that there is not enough evidence supporting the claimed associations. Finally, in the case of the metaanalysis for the association of VDR polymorphisms with osteoporosis, in which the authors claimed to use a loglinear model [78], the initially drawn conclusions are not supported. It seems that the authors did not use a correctly specified model that contains all the main effects as well as all the twoway interactions (i.e. the "no threefactor interaction model"). This probably resulted in performing a metaanalysis essentially without stratifying by study. Given that in the particular dataset the heterogeneity is large, it is of no surprise that the originally drawn conclusions are compromised after the reanalysis, which strongly indicates that there is no evidence to support a significant association. Concerning the two datasets for which we observed disagreement between the multivariate and univariate Wald tests, i.e. the association of CX3CR1 haplotypes with CAD [99] and the association of VEGF haplotypes with ALS [68], there were different reasons for the discrepancies. In the metaanalysis of CX3CR1 haplotypes (which was originally performed using the "1 vs. others" approach) the small discrepancies could be attributed to the marginal statistical significance (pvalues = 0.060.09) and the existence of a rare haplotype. In the case of the VEGF metaanalysis, the authors initially used a fixedeffects logistic regression model analogous to Eq. (17); however, the moderate heterogeneity produced slight discrepancies in the results of the multivariate Wald test under the random effects model (Table 3).
Discussion
Although the studies reporting haplotypes comprise a small fraction of genetic association studies, their number is increasingly growing and so there is a need for developing formal methods for combining them in a metaanalysis. In this work, a comprehensive framework for the metaanalysis of haplotype association studies was presented and an empirical evaluation has been performed for the first time in the literature.
The methods proposed in this work are extending previous works in metaanalysis of genetic association studies [12,16] in order to handle the multiple haplotypes. These works in turn, are based on the previously described large corpus of methods for multivariate metaanalysis [33,36,37,62,101103]. We proposed summarydata based methods as well as methods for IPD. Although the former are very easily implemented, the latter provide some very useful insights. By viewing the metaanalysis data as a 2 × r × k contingency table [45] allowed developing methods based on logistic regression, multinomial logistic regression and Poisson regression. Although logistic regression methods have long being used for metaanalysis of IPD [33,36,37], multinomial logistic regression has only being used for metaanalysis of genetic association studies under the retrospective likelihood [12,80]. Most importantly, Poisson regression models have been used in entirely different contexts, such as survival analysis [104] and metaanalysis of followup studies with varying duration [105]. Thus, an important advancement of this work is the extension of the commonly used approach for analyzing haplotype data [43,44] in the metaanalysis setting, describing appropriately specified models and presenting them in a unified framework (i.e. the contingency table analysis).
The empirical evaluation of the published literature suggests that studies reporting metaanalysis of haplotypes did not systematically differ from the metaanalyses of genetic association using SNPs in terms of the average sample size, but contain approximately half of the included studies and produce significant results twice more often. The metaanalyses that reported the complete data did not significantly differ from the remaining studies in terms of the included studies, the number of SNPs included in the haplotypes, the proportion of significant findings or the proportion of collaborative analyses. There was however, moderate evidence that the total sample size included in the metaanalyses that reported complete data, was smaller compared to the metaanalyses that did not.
The application of the methods proposed in this work in studies that reported the complete data, made clear that approximately half of the significant findings are attributable to the method of analysis used by the primary authors and suffer from an inflated type I error rate. Indeed, for the four out of the nine studies that reported significant results, these were clearly refuted by the multivariate methodology. Three of these studies used the 1 vs. other approach, which although more powerful, is known to suffer from increased type I error rate [61], whereas the results of the fourth study were based on a misspecified loglinear model. Two additional studies produced marginally insignificant results (i.e. the multivariate Wald test contradicted the univariate one), mainly due to the existence of rare haplotypes or heterogeneity that has not been accounted for in the initial analysis.
All the models presented here assume that the haplotypes are directly observed. However, as we have already discussed, the haplotypes are usually inferred and thus, treating them as known quantities may be problematic [30]. The general framework presented in this work can be easily extended in order to account for this uncertainty, simply by weighting the inferred haplotypes by their probability [49,50]. However, this will probably be problematic in many real life applications, except when dealing with a collaborative analysis, since a metaanalyst will rarely have access to individual genotype data in order to use them to estimate the haplotypes and their posterior probabilities. If combined genotypes are available for all studies, the metaanalyst may try to reconstruct the haplotypes with a method of his/her choice and perform the analysis using the posterior probabilities as weights. Moreover, if individual genotype data is available (from the literature or in a collaborative setting), the framework can be extended to allow the haplotype risk to follow models of inheritance other than the multiplicative one (i.e. estimating the risk of haplogenotypes), or to include patientlevel covariates.
The methods proposed in this work, clearly outperform the traditional naïve method of metaanalysis of haplotypes, which simply consists of contrasting each haplotype against the remaining ones. This is expected to be more profound, especially as the number of possible haplotypes increases, increasing also the type I error rate due to multiple comparisons [59,60]. Collapsing the haplotypes and performing a univariate analysis, may potentially be more powerful in several situations [61]. However, in genetic association studies, even though we are interested in small genetic effects we are also concerned about the probability of false findings [106,107]. Thus, the multivariate methodology seems to be a reliable alternative.
Conclusions
We presented multivariate methods that use summarybased data as well as methods that use binary and count data in a generalized linear mixed model framework (logistic regression, multinomial regression and Poisson regression). The methods presented here are easily implemented using standard software such as Stata, R or SAS making them easy to be applied even by non experts. In the Additional file 1, Stata code for fitting the models described in this work is given and we expect that these methods will be widely used in the future.
Authors' contributions
PGB conceived the study, performed the analyses and wrote the manuscript.
Acknowledgements
The author would like to thank the two anonymous reviewers for their valuable comments that improved the quality of the manuscript.
References

Hirschhorn JN, Lohmueller K, Byrne E, Hirschhorn K: A comprehensive review of genetic association studies.
Genet Med 2002, 4(2):4561. PubMed Abstract  Publisher Full Text

Becker KG, Barnes KC, Bright TJ, Wang SA: The genetic association database.
Nat Genet 2004, 36(5):431432. PubMed Abstract  Publisher Full Text

Normand SL: Metaanalysis: formulating, evaluating, combining, and reporting.
Stat Med 1999, 18(3):321359. PubMed Abstract  Publisher Full Text

Petiti DB: Metaanalysis Decision Analysis and CostEffectiveness Analysis. Volume 24. Oxford University Press; 1994.

Trikalinos TA, Salanti G, Zintzaras E, Ioannidis JP: Metaanalysis methods.
Adv Genet 2008, 60:311334. PubMed Abstract  Publisher Full Text

Greenland S: Metaanalysis. In Modern Epidemiology. Edited by Rothman KJ,Greenland S. Lippincott Williams & Wilkins; 1998:643673.

Chalmers TC, Berrier J, Sacks HS, Levin H, Reitman D, Nagalingam R: Metaanalysis of clinical trials as a scientific discipline. II: Replicate variability and comparison of studies that agree and disagree.
Stat Med 1987, 6(7):733744. PubMed Abstract  Publisher Full Text

Sacks HS, Berrier J, Reitman D, AnconaBerk VA, Chalmers TC: Metaanalyses of randomized controlled trials.
N Engl J Med 1987, 316(8):450455. PubMed Abstract  Publisher Full Text

Stroup DF, Berlin JA, Morton SC, Olkin I, Williamson GD, Rennie D, Moher D, Becker BJ, Sipe TA, Thacker SB: Metaanalysis of observational studies in epidemiology: a proposal for reporting. Metaanalysis Of Observational Studies in Epidemiology (MOOSE) group.
Jama 2000, 283(15):20082012. PubMed Abstract  Publisher Full Text

Salanti G, Higgins JP, Trikalinos TA, Ioannidis JP: Bayesian metaanalysis and metaregression for genedisease associations and deviations from HardyWeinberg equilibrium.
Stat Med 2007, 26(3):553567. PubMed Abstract  Publisher Full Text

Bagos PG: A unification of multivariate methods for metaanalysis of genetic association studies.
Stat Appl Genet Mol Biol 2008., 7
Article31
PubMed Abstract  Publisher Full Text 
Thakkinstian A, McElduff P, D'Este C, Duffy D, Attia J: A method for metaanalysis of molecular association studies.
Stat Med 2005, 24(9):12911306. PubMed Abstract  Publisher Full Text

Minelli C, Thompson JR, Abrams KR, Thakkinstian A, Attia J: The choice of a genetic model in the metaanalysis of molecular association studies.
Int J Epidemiol 2005, 34(6):13191328. PubMed Abstract  Publisher Full Text

Minelli C, Thompson JR, Tobin MD, Abrams KR: An integrated approach to the metaanalysis of genetic association studies using Mendelian randomization.
Am J Epidemiol 2004, 160(5):445452. PubMed Abstract  Publisher Full Text

Bagos PG, Nikolopoulos GK: A method for metaanalysis of casecontrol genetic association studies using logistic regression.
Stat Appl Genet Mol Biol 2007., 6
Article17
PubMed Abstract  Publisher Full Text 
Salanti G, Higgins JP: Metaanalysis of genetic association studies under different inheritance models using data reported as merged genotypes.
Stat Med 2008, 27(5):764777. PubMed Abstract  Publisher Full Text

Salanti G, Higgins JP, White IR: Bayesian synthesis of epidemiological evidence with different combinations of exposure groups: application to a genegeneenvironment interaction.
Stat Med 2006, 25(24):41474163. PubMed Abstract  Publisher Full Text

Zondervan KT, Cardon LR: The complex interplay among factors that influence allelic association.
Nat Rev Genet 2004, 5(2):89100. PubMed Abstract  Publisher Full Text

Kaplan N, Morris R: Issues concerning association studies for fine mapping a susceptibility gene for a complex disease.
Genet Epidemiol 2001, 20(4):432457. PubMed Abstract  Publisher Full Text

Liu N, Zhang K, Zhao H: Haplotypeassociation analysis.
Adv Genet 2008, 60:335405. PubMed Abstract  Publisher Full Text

Schaid DJ: Evaluating associations of haplotypes with traits.
Genet Epidemiol 2004, 27(4):348364. PubMed Abstract  Publisher Full Text

Clark AG: The role of haplotypes in candidate gene studies.
Genet Epidemiol 2004, 27(4):321333. PubMed Abstract  Publisher Full Text

Morris RW, Kaplan NL: On the advantage of haplotype analysis in the presence of multiple disease susceptibility alleles.
Genet Epidemiol 2002, 23(3):221233. PubMed Abstract  Publisher Full Text

Akey J, Jin L, Xiong M: Haplotypes vs single marker linkage disequilibrium tests: what do we gain?
Eur J Hum Genet 2001, 9(4):291300. PubMed Abstract  Publisher Full Text

Levenstien MA, Ott J, Gordon D: Are molecular haplotypes worth the time and expense? A costeffective method for applying molecular haplotypes.
PLoS Genet 2006, 2(8):e127. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Marchini J, Cutler D, Patterson N, Stephens M, Eskin E, Halperin E, Lin S, Qin ZS, Munro HM, Abecasis GR, et al.: A comparison of phasing algorithms for trios and unrelated individuals.
Am J Hum Genet 2006, 78(3):437450. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Xu H, Wu X, Spitz MR, Shete S: Comparison of haplotype inference methods using genotypic data from unrelated individuals.
Hum Hered 2004, 58(2):6368. PubMed Abstract  Publisher Full Text

Niu T: Algorithms for inferring haplotypes.
Genet Epidemiol 2004, 27(4):334347. PubMed Abstract  Publisher Full Text

Lin DY, Huang BE: The use of inferred haplotypes in downstream analyses.
Am J Hum Genet 2007, 80(3):577579. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

HapMap: The International HapMap Project.
Nature 2003, 426(6968):789796. PubMed Abstract  Publisher Full Text

Attia J, Thakkinstian A, D'Este C: Metaanalyses of molecular association studies: methodologic lessons for genetic epidemiology.
J Clin Epidemiol 2003, 56(4):297303. PubMed Abstract  Publisher Full Text

Thompson SG, Sharp SJ: Explaining heterogeneity in metaanalysis: a comparison of methods.
Stat Med 1999, 18(20):26932708. PubMed Abstract  Publisher Full Text

Higgins JP, Whitehead A: Borrowing strength from external trials in a metaanalysis.
Stat Med 1996, 15(24):27332749. PubMed Abstract  Publisher Full Text

Higgins JP, Whitehead A, Turner RM, Omar RZ, Thompson SG: Metaanalysis of continuous outcome data from individual patients.
Stat Med 2001, 20(15):22192241. PubMed Abstract  Publisher Full Text

Turner RM, Omar RZ, Yang M, Goldstein H, Thompson SG: A multilevel model framework for metaanalysis of clinical trials with binary outcomes.
Stat Med 2000, 19(24):34173432. PubMed Abstract  Publisher Full Text

van Houwelingen HC, Arends LR, Stijnen T: Advanced methods in metaanalysis: multivariate approach and metaregression.
Stat Med 2002, 21(4):589624. PubMed Abstract  Publisher Full Text

Wallenstein S, Hodge SE, Weston A: Logistic regression model for analyzing extended haplotype data.
Genet Epidemiol 1998, 15(2):173181. PubMed Abstract  Publisher Full Text

McCullagh P, Nelder JA: Generalized Linear Models. London: Chapman & Hall; 1989.

Chen YH, Kao JT: Multinomial logistic regression approach to haplotype association analysis in populationbased casecontrol studies.
BMC Genet 2006, 7:43. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Haber M: LogLinear Models for Linked Loci.
Biometrics 1984, 40(1):189198. Publisher Full Text

Weir BS, Wilson SR: Loglinear models for linked loci.
Biometrics 1986, 42(3):665670. PubMed Abstract  Publisher Full Text

Tiret L, Amouyel P, Rakotovao R, Cambien F, Ducimetiere P: Testing for association between disease and linked marker loci: a loglinearmodel analysis.
Am J Hum Genet 1991, 48(5):926934. PubMed Abstract  PubMed Central Full Text

Mander AP: Haplotype analysis in populationbased association studies.

Agresti A: Categorical Data Analysis. 2nd edition. John Wiley & Sons; 2002.

Chen HY: A note on the prospective analysis of outcomedependent samples.
J Roy Soc B 2003, 65(2):575584. Publisher Full Text

Prentice RL, Pyke R: Logistic disease incidence models and casecontrol studies.
Biometrika 1979, 66(3):403411. Publisher Full Text

Umbach DM, Weinberg CR: Designing and analysing casecontrol studies to exploit independence of genotype and exposure.
Stat Med 1997, 16(15):17311743. PubMed Abstract  Publisher Full Text

French B, Lumley T, Monks SA, Rice KM, Hindorff LA, Reiner AP, Psaty BM: Simple estimates of haplotype relative risks in casecontrol data.
Genet Epidemiol 2006, 30(6):485494. PubMed Abstract  Publisher Full Text

Zaykin DV, Westfall PH, Young SS, Karnoub MA, Wagner MJ, Ehm MG: Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals.
Hum Hered 2002, 53(2):7991. PubMed Abstract  Publisher Full Text

Schaid DJ, Rowland CM, Tines DE, Jacobson RM, Poland GA: Score tests for association between traits and haplotypes when linkage phase is ambiguous.
Am J Hum Genet 2002, 70(2):425434. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Epstein MP, Satten GA: Inference on haplotype effects in casecontrol studies using unphased genotype data.
Am J Hum Genet 2003, 73(6):13161329. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Chen X, Li Z: Inference of haplotype effects in casecontrol studies using unphased genotype and environmental data.
Biom J 2008, 50(2):270282. PubMed Abstract  Publisher Full Text

Satten GA, Epstein MP: Comparison of prospective and retrospective methods for haplotype inference in casecontrol studies.
Genet Epidemiol 2004, 27(3):192201. PubMed Abstract  Publisher Full Text

Thomas A: Characterizing allelic associations from unphased diploid data by graphical modeling.
Genet Epidemiol 2005, 29(1):2335. PubMed Abstract  Publisher Full Text

Baker SG: A simple loglinear model for haplotype effects in a casecontrol study involving two unphased genotypes.
Stat Appl Genet Mol Biol 2005., 4
Article14
PubMed Abstract  Publisher Full Text 
Lin DY, Zeng D, Millikan R: Maximum likelihood estimation of haplotype effects and haplotypeenvironment interactions in association studies.
Genet Epidemiol 2005, 29(4):299312. PubMed Abstract  Publisher Full Text

DerSimonian R, Laird N: Metaanalysis in clinical trials.
Controlled Clinical Trials 1986, 7:177188. PubMed Abstract  Publisher Full Text

Becker T, Cichon S, Jonson E, Knapp M: Multiple testing in the context of haplotype analysis revisited: application to casecontrol data.
Ann Hum Genet 2005, 69(Pt 6):747756. PubMed Abstract  Publisher Full Text

Becker T, Knapp M: A powerful strategy to account for multiple testing in the context of haplotype analysis.
Am J Hum Genet 2004, 75(4):561570. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Matthews AG, Haynes C, Liu C, Ott J: Collapsing SNP genotypes in casecontrol genomewide association studies increases the type I error rate and power.
Stat Appl Genet Mol Biol 2008., 7(1)
Article23
PubMed Abstract  PubMed Central Full Text 
Berkey CS, Hoaglin DC, AntczakBouckoms A, Mosteller F, Colditz GA: Metaanalysis of multiple outcomes by regression with random effects.
Stat Med 1998, 17(22):25372550. PubMed Abstract  Publisher Full Text

Jackson D, White IR, Thompson SG: Extending DerSimonian and Laird's methodology to perform multivariate random effects metaanalyses.
Stat Med 29(12):12821297. PubMed Abstract  Publisher Full Text

Judge GG, Griffiths WE, Hill RC, Lutkepohl H, Lee TC: The Theory and Practice of Econometrics. 2nd edition. New York: John Wiley & Sons; 1985.

Berndt SI, Potter JD, Hazra A, Yeager M, Thomas G, Makar KW, Welch R, Cross AJ, Huang WY, Schoen RE, et al.: Pooled analysis of genetic variation at chromosome 8q24 and colorectal neoplasia risk.
Hum Mol Genet 2008, 17(17):26652672. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Setiawan VW, Doherty JA, Shu XO, Akbari MR, Chen C, De Vivo I, Demichele A, GarciaClosas M, Goodman MT, Haiman CA, et al.: Two estrogenrelated variants in CYP19A1 and endometrial cancer risk: a pooled analysis in the Epidemiology of Endometrial Cancer Consortium.
Cancer Epidemiol Biomarkers Prev 2009, 18(1):242247. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Lambrechts D, Storkebaum E, Morimoto M, DelFavero J, Desmet F, Marklund SL, Wyns S, Thijs V, Andersson J, van Marion I, et al.: VEGF is a modifier of amyotrophic lateral sclerosis in mice and humans and protects motoneurons against ischemic death.
Nat Genet 2003, 34(4):383394. PubMed Abstract  Publisher Full Text

Uitterlinden AG, Ralston SH, Brandi ML, Carey AH, Grinberg D, Langdahl BL, Lips P, Lorenc R, ObermayerPietsch B, Reeve J, et al.: The association between common vitamin D receptor gene variations and osteoporosis: a participantlevel metaanalysis.
Ann Intern Med 2006, 145(4):255264. PubMed Abstract

Higgins JP, Thompson SG, Deeks JJ, Altman DG: Measuring inconsistency in metaanalyses.
Bmj 2003, 327(7414):557560. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Skrondal A, RabeHesketh S: Multilevel logistic regression for polytomous data and rankings.
Psychometrika 2003, 68(2):267287. Publisher Full Text

Mickey RM, Elashoff R: A generalization of the MantelHaenszel estimator of partial association for 2 × J × K tables.
Biometrics 1985, 41(3):623635. Publisher Full Text

Heyman ER, Koch GG: Average Partial Association in ThreeWay Contingency Tables: A Review and Discussion of Alternative Tests.
International Statistical Review 1978, 46:237254. Publisher Full Text

Berrington ADG, Cox DR: Interpretation of interaction: A review.
Ann Appl Stat 2007, 1(2):371385. Publisher Full Text

Mickey RM: Assessment of three way interaction in 2 × J × K tables.

Zintzaras E, Koufakis T, Ziakas PD, Rodopoulou P, Giannouli S, Voulgarelis M: A metaanalysis of genotypes and haplotypes of methylenetetrahydrofolate reductase gene polymorphisms in acute lymphoblastic leukemia.
Eur J Epidemiol 2006, 21(7):501510. PubMed Abstract  Publisher Full Text

Thakkinstian A, D'Este C, Attia J: Haplotype analysis of VDR gene polymorphisms: a metaanalysis.
Osteoporos Int 2004, 15(9):729734. PubMed Abstract  Publisher Full Text

Lu G, Ades AE: Combination of direct and indirect evidence in mixed treatment comparisons.
Stat Med 2004, 23(20):31053124. PubMed Abstract  Publisher Full Text

Minelli C, Thompson JR, Abrams KR, Lambert PC: Bayesian implementation of a genetic modelfree approach to the metaanalysis of genetic association studies.
Stat Med 2005, 24(24):38453861. PubMed Abstract  Publisher Full Text

Riley RD, Abrams KR, Lambert PC, Sutton AJ, Thompson JR: An evaluation of bivariate randomeffects metaanalysis for the joint synthesis of two correlated outcomes.
Stat Med 2007, 26(1):7897. PubMed Abstract  Publisher Full Text

Riley RD, Abrams KR, Sutton AJ, Lambert PC, Thompson JR: Bivariate randomeffects metaanalysis and the estimation of betweenstudy correlation.
BMC Med Res Methodol 2007, 7:3. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

RabeHesketh S, Skrondal A, Pickles A: Reliable estimation of generalized linear mixed models using adaptive quadrature.

RabeHesketh S, Skrondal A, Pickles A: Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects.
Journal of Econometrics 2005, 128(2):301323. Publisher Full Text

Ioannidis JP, Ntzani EE, Trikalinos TA: 'Racial' differences in genetic effects for complex diseases.
Nat Genet 2004, 36(12):13121318. PubMed Abstract  Publisher Full Text

Ioannidis JP, Ntzani EE, Trikalinos TA, ContopoulosIoannidis DG: Replication validity of genetic association studies.
Nat Genet 2001, 29(3):306309. PubMed Abstract  Publisher Full Text

Ioannidis JP, Trikalinos TA: Early extreme contradictory estimates may appear in published research: the Proteus phenomenon in molecular genetics research and randomized trials.
J Clin Epidemiol 2005, 58(6):543549. PubMed Abstract  Publisher Full Text

Ioannidis JP, Trikalinos TA, Ntzani EE, ContopoulosIoannidis DG: Genetic associations in large versus small studies: an empirical assessment.
Lancet 2003, 361(9357):567571. PubMed Abstract  Publisher Full Text

Trikalinos TA, Ntzani EE, ContopoulosIoannidis DG, Ioannidis JP: Establishment of genetic associations for complex diseases is independent of early study findings.
Eur J Hum Genet 2004, 12(9):762769. PubMed Abstract  Publisher Full Text

Danforth KN, Hayes RB, Rodriguez C, Yu K, Sakoda LC, Huang WY, Chen BE, Chen J, Andriole GL, Calle EE, et al.: Polymorphic variants in PTGS2 and prostate cancer risk: results from two large nested casecontrol studies.
Carcinogenesis 2008, 29(3):568572. PubMed Abstract  Publisher Full Text

Danforth KN, Rodriguez C, Hayes RB, Sakoda LC, Huang WY, Yu K, Calle EE, Jacobs EJ, Chen BE, Andriole GL, et al.: TNF polymorphisms and prostate cancer risk.
Prostate 2008, 68(4):400407. PubMed Abstract  Publisher Full Text

Sareneva I, Koskinen LL, KorponaySzabo IR, Kaukinen K, Kurppa K, Ziberna F, Vatta S, Not T, Ventura A, Adany R, et al.: Linkage and association study of FcgammaR polymorphisms in celiac disease.
Tissue Antigens 2009, 73(1):5458. PubMed Abstract  Publisher Full Text

Kedda MA, Duffy DL, Bradley B, O'Hehir RE, Thompson PJ: ADAM33 haplotypes are associated with asthma in a large Australian population.
Eur J Hum Genet 2006, 14(9):10271036. PubMed Abstract  Publisher Full Text

Kavvoura FK, Akamizu T, Awata T, Ban Y, Chistiakov DA, Frydecka I, Ghaderi A, Gough SC, Hiromatsu Y, Ploski R, et al.: Cytotoxic Tlymphocyte associated antigen 4 gene polymorphisms and autoimmune thyroid disease: a metaanalysis.
J Clin Endocrinol Metab 2007, 92(8):31623170. PubMed Abstract  Publisher Full Text

Kehoe PG, Katzov H, Feuk L, Bennet AM, Johansson B, Wiman B, de Faire U, Cairns NJ, Wilcock GK, Brookes AJ, et al.: Haplotypes extending across ACE are associated with Alzheimer's disease.
Hum Mol Genet 2003, 12(8):859867. PubMed Abstract  Publisher Full Text

Ma J, Qin W, Wang XY, Guo TW, Bian L, Duan SW, Li XW, Zou FG, Fang YR, Fang JX, et al.: Further evidence for the association between G72/G30 genes and schizophrenia in two ethnically distinct populations.
Mol Psychiatry 2006, 11(5):479487. PubMed Abstract  Publisher Full Text

Tsuchiya T, Schwarz PE, BosquePlata LD, Geoffrey Hayes M, Dina C, Froguel P, Wayne Towers G, Fischer S, TemelkovaKurktschiev T, Rietzsch H, et al.: Association of the calpain10 gene with type 2 diabetes in Europeans: results of pooled and metaanalyses.
Mol Genet Metab 2006, 89(12):174184. PubMed Abstract  Publisher Full Text

HollisMoffatt JE, Rowley KA, PhippsGreen AJ, Merriman ME, Dalbeth N, Gow P, Harrison AA, Highton J, Jones PB, Stamp LK, et al.: The ITGAV rs3738919 variant and susceptibility to rheumatoid arthritis in four Caucasian sample sets.
Arthritis Res Ther 2009, 11(5):R152. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Apostolakis S, Amanatidou V, Papadakis EG, Spandidos DA: Genetic diversity of CX3CR1 gene and coronary artery disease: new insights through a metaanalysis.
Atherosclerosis 2009, 207(1):815. PubMed Abstract  Publisher Full Text

Song Y, Niu T, Manson JE, Kwiatkowski DJ, Liu S: Are variants in the CAPN10 gene related to risk of type 2 diabetes? A quantitative assessment of population and familybased association studies.
Am J Hum Genet 2004, 74(2):208222. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Berkey CS, Hoaglin DC, Mosteller F, Colditz GA: A randomeffects regression model for metaanalysis.
Stat Med 1995, 14(4):395411. PubMed Abstract  Publisher Full Text

Thompson SG, Turner RM, Warn DE: Multilevel models for metaanalysis, and their application to absolute risk differences.
Stat Methods Med Res 2001, 10(6):375392. PubMed Abstract  Publisher Full Text

van Houwelingen HC, Zwinderman KH, Stijnen T: A bivariate approach to metaanalysis.
Stat Med 1993, 12(24):22732284. PubMed Abstract  Publisher Full Text

Fiocco M, Putter H, van Houwelingen JC: Metaanalysis of pairs of survival curves under heterogeneity: a Poisson correlated gammafrailty approach.
Stat Med 2009, 28(30):37823797. PubMed Abstract  Publisher Full Text

Bagos PG, Nikolopoulos GK: Mixedeffects poisson regression models for metaanalysis of followup studies with constant or varying durations.
International Journal of Biostatistics 2009., 5
Article21

Ioannidis JP: Genetic associations: false or true?
Trends Mol Med 2003, 9(4):135138. PubMed Abstract  Publisher Full Text

Ioannidis JP: Why most published research findings are false.
PLoS Med 2005, 2(8):e124. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Weston A, Pan CF, Ksieski HB, Wallenstein S, Berkowitz GS, Tartter PI, Bleiweiss IJ, Brower ST, Senie RT, Wolff MS: p53 haplotype determination in breast cancer.
Cancer Epidemiol Biomarkers Prev 1997, 6(2):105112. PubMed Abstract  Publisher Full Text

Nunokawa A, Watanabe Y, Kaneko N, Sugai T, Yazaki S, Arinami T, Ujike H, Inada T, Iwata N, Kunugi H, et al.: The dopamine D3 receptor (DRD3) gene and risk of schizophrenia: casecontrol studies and an updated metaanalysis.
Schizophr Res 2010, 116(1):6167. PubMed Abstract  Publisher Full Text

Moxley G, Meulenbelt I, Chapman K, van Diujn CM, Eline Slagboom P, Neale MC, Smith AJ, Carr AJ, Loughlin J: Interleukin1 region metaanalysis with osteoarthritis phenotypes.
Osteoarthritis Cartilage 2010, 18(2):200207. PubMed Abstract  Publisher Full Text

Evangelou E, Chapman K, Meulenbelt I, Karassa FB, Loughlin J, Carr A, Doherty M, Doherty S, GomezReino JJ, Gonzalez A, et al.: Largescale analysis of association between GDF5 and FRZB variants and osteoarthritis of the hip, knee, and hand.
Arthritis Rheum 2009, 60(6):17101721. PubMed Abstract  Publisher Full Text

Zintzaras E, Rodopoulou P, Sakellaridis N: Variants of the arachidonate 5lipoxygenaseactivating protein (ALOX5AP) gene and risk of stroke: a HuGE genedisease association review and metaanalysis.
Am J Epidemiol 2009, 169(5):523532. PubMed Abstract  Publisher Full Text

Auburn S, Diakite M, Fry AE, Ghansah A, Campino S, Richardson A, Jallow M, SisayJoof F, Pinder M, Griffiths MJ, et al.: Association of the GNAS locus with severe malaria.
Hum Genet 2008, 124(5):499506. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Shi J, Badner JA, Liu C: PDLIM5 and susceptibility to bipolar disorder: a familybased association study and metaanalysis.
Psychiatr Genet 2008, 18(3):116121. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Bevan S, Dichgans M, Gschwendtner A, Kuhlenbaumer G, Ringelstein EB, Markus HS: Variation in the PDE4D gene and ischemic stroke risk: a systematic review and metaanalysis on 5200 cases and 6600 controls.
Stroke 2008, 39(7):19661971. PubMed Abstract  Publisher Full Text

Thakkinstian A, Dmitrienko S, GerbaseDelima M, McDaniel DO, Inigo P, Chow KM, McEvoy M, Ingsathit A, Trevillian P, Barber WH, et al.: Association between cytokine gene polymorphisms and outcomes in renal transplantation: a metaanalysis of individual patient data.
Nephrol Dial Transplant 2008, 23(9):30173023. PubMed Abstract  Publisher Full Text

Schunkert H, Gotz A, Braund P, McGinnis R, Tregouet DA, Mangino M, LinselNitschke P, Cambien F, Hengstenberg C, Stark K, et al.: Repeated replication and a prospective metaanalysis of the association between chromosome 9p21.3 and coronary artery disease.
Circulation 2008, 117(13):16751684. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

CastanoRodriguez N, DiazGallo LM, PinedaTamayo R, RojasVillarraga A, Anaya JM: Metaanalysis of HLADRB1 and HLADQB1 polymorphisms in Latin American patients with systemic lupus erythematosus.
Autoimmun Rev 2008, 7(4):322330. PubMed Abstract  Publisher Full Text

Lyon HN, Florez JC, Bersaglieri T, Saxena R, Winckler W, Almgren P, Lindblad U, Tuomi T, Gaudet D, Zhu X, et al.: Common variants in the ENPP1 gene are not reproducibly associated with diabetes or obesity.
Diabetes 2006, 55(11):31803184. PubMed Abstract  Publisher Full Text

Li D, Collier DA, He L: Metaanalysis shows strong positive association of the neuregulin 1 (NRG1) gene with schizophrenia.
Hum Mol Genet 2006, 15(12):19952002. PubMed Abstract  Publisher Full Text

Talkowski ME, Seltman H, Bassett AS, Brzustowicz LM, Chen X, Chowdari KV, Collier DA, Cordeiro Q, Corvin AP, Deshpande SN, et al.: Evaluation of a susceptibility gene for schizophrenia: genotype based metaanalysis of RGS4 polymorphisms from thirteen independent samples.
Biol Psychiatry 2006, 60(2):152162. PubMed Abstract  Publisher Full Text

Thakkinstian A, McEvoy M, Minelli C, Gibson P, Hancox B, Duffy D, Thompson J, Hall I, Kaufman J, Leung TF, et al.: Systematic review and metaanalysis of the association between {beta}2adrenoceptor polymorphisms and asthma: a HuGE review.
Am J Epidemiol 2005, 162(3):201211. PubMed Abstract  Publisher Full Text

Ioannidis JP, Ralston SH, Bennett ST, Brandi ML, Grinberg D, Karassa FB, Langdahl B, van Meurs JB, Mosekilde L, Scollen S, et al.: Differential genetic effects of ESR1 gene polymorphisms on osteoporosis outcomes.
JAMA 2004, 292(17):21052114. PubMed Abstract  Publisher Full Text

Johansson M, McKay JD, Wiklund F, Rinaldi S, Verheus M, van Gils CH, Hallmans G, Balter K, Adami HO, Gronberg H, et al.: Implications for prostate cancer of insulinlike growth factorI (IGFI) genetic variation and circulating IGFI levels.
J Clin Endocrinol Metab 2007, 92(12):48204826. PubMed Abstract  Publisher Full Text

De Gaetano M, Quacquaruccio G, Pezzini A, Latella MC, A DIC, Del Zotto E, Padovani A, Lichy C, GrondGinsbach C, Gattone M, et al.: Tissue factor gene polymorphisms and haplotypes and the risk of ischemic vascular events: four studies and a metaanalysis.
J Thromb Haemost 2009, 7(9):14651471. PubMed Abstract  Publisher Full Text

Orozco G, Abelson AK, GonzalezGay MA, Balsa A, PascualSalcedo D, Garcia A, FernandezGutierrez B, Petersson I, PonsEstel B, Eimon A, et al.: Study of functional variants of the BANK1 gene in rheumatoid arthritis.
Arthritis Rheum 2009, 60(2):372379. PubMed Abstract  Publisher Full Text

Brunner EJ, Kivimaki M, Witte DR, Lawlor DA, Davey Smith G, Cooper JA, Miller M, Lowe GD, Rumley A, Casas JP, et al.: Inflammation, insulin resistance, and diabetesMendelian randomization using CRP haplotypes points upstream.
PLoS Med 2008, 5(8):e155. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Lee KM, Kang D, Clapper ML, IngelmanSundberg M, OnoKihara M, Kiyohara C, Min S, Lan Q, Le Marchand L, Lin P, et al.: CYP1A1, GSTM1, and GSTT1 polymorphisms, smoking, and lung cancer risk in a pooled analysis among Asian populations.
Cancer Epidemiol Biomarkers Prev 2008, 17(5):11201126. PubMed Abstract  Publisher Full Text

McGrath M, Lee IM, Hankinson SE, Kraft P, Hunter DJ, Buring J, De Vivo I: Androgen receptor polymorphisms and endometrial cancer risk.
Int J Cancer 2006, 118(5):12611268. PubMed Abstract  Publisher Full Text

Huang WY, Olshan AF, Schwartz SM, Berndt SI, Chen C, Llaca V, Chanock SJ, Fraumeni JF Jr, Hayes RB: Selected genetic polymorphisms in MGMT, XRCC1, XPD, and XRCC3 and risk of head and neck cancer: a pooled analysis.
Cancer Epidemiol Biomarkers Prev 2005, 14(7):17471753. PubMed Abstract  Publisher Full Text

Maraganore DM, de Andrade M, Elbaz A, Farrer MJ, Ioannidis JP, Kruger R, Rocca WA, Schneider NK, Lesnick TG, Lincoln SJ, et al.: Collaborative analysis of alphasynuclein gene promoter variability and Parkinson disease.
JAMA 2006, 296(6):661670. PubMed Abstract  Publisher Full Text