Abstract
Background
With the completion of the international HapMap project, many studies have been conducted to investigate the association between complex diseases and haplotype variants. Such haplotypebased association studies, however, often face two difficulties; one is the large number of haplotype configurations in the chromosome region under study, and the other is the ambiguity in haplotype phase when only genotype data are observed. The latter complexity may be handled based on an EM algorithm with family data incorporated, whereas the former can be more problematic, especially when haplotypes of rare frequencies are involved. Here based on family data we propose to cluster long haplotypes of linked SNPs in a biological sense, so that the number of haplotypes can be reduced and the power of statistical tests of association can be increased.
Results
In this paper we employ family genotype data and combine a clustering scheme with a likelihood ratio statistic to test the association between quantitative phenotypes and haplotype variants. Haplotypes are first grouped based on their evolutionary closeness to establish a set containing core haplotypes. Then, we construct for each family the transmission and nontransmission phase in terms of these core haplotypes, taking into account simultaneously the phase ambiguity as weights. The likelihood ratio test (LRT) is next conducted with these weighted and clustered haplotypes to test for association with disease. This combination of evolutionguided haplotype clustering and weighted assignment in LRT is able, via its corecoding system, to incorporate into analysis both haplotype phase ambiguity and transmission uncertainty. Simulation studies show that this proposed procedure is more informative and powerful than three familybased association tests, FAMHAP, FBAT, and an LRT with a group consisting exclusively of rare haplotypes.
Conclusions
The proposed procedure takes into account the uncertainty in phase determination and in transmission, utilizes the evolutionary information contained in haplotypes, reduces the dimension in haplotype space and the degrees of freedom in tests, and performs better in association studies. This evolutionguided clustering procedure is particularly useful for long haplotypes containing linked SNPs, and is applicable to other haplotypebased association tests. This procedure is now implemented in R and is free for download.
Background
Highdensity sets of SNPs, especially haplotypes, have been used widely in genetic research to explore possible association with complex diseases. Haplotypes are considered to be the biological units containing more information about transmission, and thus may be better biomarkers to use in examining the disease susceptible region. However, haplotype phase is often unknown when only genotype data are observed. This linkage phase ambiguity often leads to large degrees of freedom in statistical tests, and may result in estimation of many haplotypes with rare frequencies. Collection of family genotype data may help in determination of haplotype phase if information from other family members can be incorporated and crossreferenced. Additionally, the use of family data can avoid spurious association arising from population admixture. Nevertheless, the statistical analysis of family data may not be straightforward. For instance, the nonparametric transmission disequilibrium test (TDT) and other similar tests utilize the transmitted and nontransmitted alleles (or haplotypes) to detect association. For uncertainty both in transmission and in phase, most procedures adopt an expectationmaximization (EM) algorithm in computation of score statistics or likelihood functions, such as FBAT [1] and FAMHAP [2,3]. These methods infer haplotypes based on nuclear families, where any number of children and configuration of missing genotype data are allowed. FBAT considers score statistics for the number of risk haplotypes among affected offspring; whereas FAMHAP adopts a likelihood approach to estimate haplotype frequencies and to test for association based on nuclear family members or unrelated individuals.
As for addressing the problems that result when a large number of different haplotypes are involved in analysis and when certain haplotypes occur with very low frequency, several approaches have been adopted. Some studies have deleted haplotypes with small estimated frequencies [4,5], and some have combined these rare haplotypes to form a new group. Although these procedures can reduce the total number of parameters and the inflated variation due to small frequencies, they risk information loss due to the arbitrary deletion or grouping of rare haplotypes. In contrast, several procedures have adopted an approach which uses a measure of evolutionary relationship to cluster rare haplotypes in casecontrol studies [610]. These clustering algorithms define a core set of haplotypes that are considered "ancient," where being ancient is approximated by being more frequent [710]. Once the core set is determined, rare haplotypes are then clustered with their ancestors as well as denoted by them. For family data, we want to employ such a clustering scheme, so that the statistical tests can be conducted more efficiently.
Under the assumption of random mating and with the use of transmitted and nontransmitted haplotypes from parents, we construct for family data a core set of haplotypes based on estimates of haplotype frequencies. In the following sections, we start with the notation used in Becker and Knapp [2], conduct the clustering procedure for family data based on frequency estimates from FAMHAP, determine the pair of transmitted and nontransmitted corecoding haplotypes, and introduce the assignment of coding to be used later in our tests of disease association. The uncertainty in phase explanation, in transmission status, and in core representation associated with each genotype will be expressed with weights. Unlike the analysis for casecontrol studies, these weights are essential for further statistical analysis of pedigree data. We next adopt this transformed data in a likelihood ratio test (LRT) of association, and call it LRTC. This LRTC modifies the original test in FAMHAP in terms of the core haplotypes. To evaluate if the proposed clustering approach can identify correctly the true core haplotypes and to demonstrate the advantage of LRTC accrued by inclusion of an evolutionary interpretation, we conduct simulation studies by applying a coalescentbased whole genome simulator GENOME [11]. For the purpose of comparison with other tests, we consider the original LRT in FAMHAP, a score test in FBAT, and a naïve LRT with all rare haplotypes clustered to form a new group (LRTG).
Methods
Notation
Following the same notation used in Becker and Knapp [2], let G_{i }denote the set of unphased genotypes in the ith family, , where the superscripts index the members in the ith family. For example, and are multilocus genotypes of father and mother, respectively, and denotes the genotype of the lth child, where 1 ≤l≤n_{i }and n_{i }is the number of children in the ith family. If haplotypes j and k are compatible with the father's genotype , this relation is denoted by . Similarly, if haplotypes u and v are compatible with the mother's genotype , it is then denoted by . The rest are deduced analogously. Furthermore, if the lth child is the proband in the ith family, then his/her haplotype transmission patterns will be inferred with reference to each sibling's genotype information as well as the parents' information. Therefore, we define as the haplotype explanation set containing all possible haplotype transmission patterns compatible with, G_{i},
where represents the number of transmission patterns of (j, k, u, v) that are compatible with the genotype of the child l'.
It is worth noting that, under the null hypothesis of no association, is the intersection of all for l' = 1,...,n_{i }in the ith family. Consequently, the estimation of haplotype frequency is independent of who the proband is. In contrast, when the gene is associated with the disease, the haplotype explanation set and the estimations of the transmitted and nontransmitted haplotype frequencies will depend on the ascertained proband. Therefore, throughout this paper, we rearrange the order of children such that the first child is always the first affected one in each family. In the following, we use FAMHAP to estimate the configurations of haplotypes and their corresponding frequencies based on likelihoods. In fact, at this stage of computation, other public software programs like FBAT and Transmit [12] can provide stable estimates as well.
Step 1 Clustering haplotypes
With the frequency approximating the "age" of haplotypes, a cladistic clustering approach is conducted based on their evolutionary relation. Similar to the haplotype clustering method for casecontrol studies in Tzeng [6], here we identify first the ancestor (core) and descendent haplotypes based on family data, and then cluster the descendents with their ancestors for dimension reduction.
The selected core haplotypes are the leading c haplotypes with a cumulative sum of frequencies reaching 90%, and are defined as the set with corresponding frequencies
The superscript is the number of generations in the evolutionary tree, and the subscript stands for the order of haplotype frequency from large to small. Similarly, the set of haplotypes with one step mutation from H^{(0) }is denoted as H^{(1)}, and those m steps away are contained in H^{(m)}. For each set H^{(m)}, the corresponding haplotype frequencies are denoted as ∏^{(m)}, where m = 1,...,M and M is the largest distance. Between any two adjacent sets H^{(m) }and H^{(m1)}, an allocation matrix B^{(m) }is defined to represent the probability that a certain haplotype in H^{(m) }is a direct descendant of a haplotype in H^{(m1) }[6,13]. Therefore, the frequencies of the core haplotypes can be revised as
where (∏^{(m)})^{t }is the transpose of ∏^{(m)}. Detailed explanation is provided in Additional file 1.
Additional file 1. Derivation of haplotype frequency estimates and haplotype explanation set for each family based on genotype data.
Format: PDF Size: 18KB Download file
This file can be viewed with: Adobe Acrobat Reader
Note that the above derivations do not require the information of haplotype phase for each family member. In other words, any software which provides estimates of haplotype frequencies can be applied at this stage. However, we prefer FAMHAP because it also computes for each family the compatible transmitted and nontransmitted haplotypes, along with the weights, which will be utilized in the following steps of recoding and testing.
Step 2 Recoding transmission and nontransmission haplotypes for analysis
After determining the core haplotypes with updated frequencies, we begin to rewrite the remaining noncore haplotypes in terms of their corresponding ancestor haplotypes in the core. In other words, all representations including haplotype explanation, phase uncertainty, and transmission stages are rewritten as a function of the core in order to reduce dimensionality. This recoding procedure can be carried out via the allocation matrix B^{(m) }defined earlier. Let matrix Γ^{m }be the product of m matrices B^{(m)},B^{(m1)},... and B^{(1)}, thus the row dimension of Γ^{m }indicates the number of rare haplotypes in H^{(m)}, and the column dimension of Γ^{m }stands for the number of core haplotypes in H^{(0)}. For instance, the ith row in Γ^{m }is
where c this dimensional row vector lists the probabilities that the ith haplotype in H^{(m) }is to be clustered with the ancestor haplotypes in the core. Therefore, in the following the original frequency of a rare haplotype is replaced by , a linearly weighted sum of the modified core haplotype frequencies
The transmission status, along with its probability, haplotype phase ambiguity, and evolution uncertainty, can now be rearranged, before statistical analysis, using equation (2). For instance, under the alternative hypothesis of association, the frequency of the ith haplotype in H^{(m) }for the transmission group Tr becomes , where is the updated core haplotype frequency vector
and is the frequency vector of transmitted haplotypes in H^{(m)}. For the nontransmission group NTr, the calculations of and are carried out in the same way.
Step 3 Likelihood ratio test with clustered haplotypes (LRTC)
We now derive the likelihood ratio test statistic with all parameters rewritten in terms of the core haplotype frequencies. This amounts to rewriting Becker and Knapp's [14] procedure with only (c  1) parameters,
where n is the number of families and is the likelihood of the ith family under the alternative with the first child being affected
The γ_{j.Tr }and γ_{u.Tr }stand for frequencies in the new representation (2) for the transmitted haplotypes (j,u), and γ_{k.NTr }and γ_{v.NTr }are the modified frequencies for the nontransmitted haplotypes (k,v). The function s is the number of transmission patterns compatible with the genotype of the l'th child c_{l'}, as defined in Becker and Knapp's paper. Note that the value of will not differ no matter what haplotypes, core or not, are being investigated. That is, there is no need to recompute s with respect to the core system, and its value derived when specifying the haplotype explanation set in the beginning of this procedure can be used directly here.
Results
To evaluate the performance of the likelihood ratio test with clustered haplotypes in family studies, we conduct simulations to first examine the reconstruction and identification of core haplotypes, and next to evaluate the impact of this clustering scheme on the likelihood ratio test. The results are compared with three familybased association methods, FAMHAP [2], FBAT [15], and finally an LRT using a naïve group composed exclusively of rare haplotypes (LRTG).
Sampling scheme for simulations
The SNP haplotype sequences were first simulated based on a coalescentbased whole genome simulator GENOME [11]. These sequences were generated from a population with an effective size of 10,000, with the number of SNPs assumed to follow a Poisson distribution with mean equal to the product of mutation rate 10^{6}/bp, and with 1,000 base pairs per each fragment. The recombination rate between ten consecutive fragments was assumed to be 10^{4}. Default settings were used for other parameters such as the mutation and migration rates of 10^{6 }and 2.5 × 10^{4}, respectively. This resulted in 100 sequences with 972 SNPs. After deleting alleles with minor allele frequency (MAF) less than 5%, 536 SNPs were left. Haploview [16] was next used to identify haplotype blocks and to extract tag SNPs. Finally, seven blocks were determined. We selected the longest block, the seventh, for use in constructing the data for nuclear families and derived 13 tag SNPs which formed 15 haplotypes in this region. Figure 1 shows the linkage disequilibrium (LD) plot with haploview. A complete plot of LD for all tag SNPs with corresponding haplotype blocks is displayed in Additional file 2. Figures 2 and 3 display the corresponding minor allele frequencies and the haplotype frequencies, respectively.
Figure 1. LD plot. LD plot of the simulated region.
Additional file 2. The complete plot of LD for all tag SNPs.
Format: PDF Size: 453KB Download file
This file can be viewed with: Adobe Acrobat Reader
Figure 2. Minor allele frequencies. Minor allele frequencies of the 13 SNPs consisting of the haplotype region.
Figure 3. Haplotype frequencies. Frequencies of the 15 haplotypes considered in simulation studies.
In the following simulations the number of families N = 200 was considered. For each family, the number of children, in addition to the proband, follows a Poisson distribution with mean 2, and this value was kept the same for this family in each replication. Let one of the marker loci ' A ' denote the liability allele with frequency p fixed at 0.1, 0.25, or 0.5. Let f_{i }be the penetrance function, where i = 0, 1, 2, and let r = f_{1}/f_{0 }= 2, 2.5, or 3 be the relative ratio. The prevalence K was set at 1% under the recessive, additive, and dominant models. These values for simulation settings are listed in Table 1. There were 27 simulation settings considered, and there were 1,000 replications under each setting. For each family, we generated first the haplotypes of the proband, next the parents' nontransmitted haplotypes, and then the haplotypes of other siblings. All haplotype data were transformed to genotypes before analysis.
Table 1. Penetrance values for simulation settings
Identification of core haplotypes and tests of association
Once the family genotype data were simulated, they were used to estimate haplotype compositions, corresponding frequencies, and also the set of core haplotypes. Figure 4 displays the average number of haplotypes identified in the simulation studies under the additive model for various numbers of families (N = 50, 100, and 200) and relative ratios (r = 2 and 3). These numbers are all close to the true value 15. Results under the dominant and recessive models are similar (data not shown here). In Figure 5, we plot the average percentage of identified core haplotypes among the set of ten true core haplotypes. The percentages are high indicating good representation and consistency of this evolutionguided clustering procedure for family data. It can be observed in the figure that the percentage of core haplotypes being identified by our procedure is not much affected by the frequency p of the susceptible allele A, or by the mode of inheritance. For example, under recessive models with r = 2 (the solid pink line in Figure 5), the percentages are 90%, 91%, and 89% for p = 0.1, 0.25, and 0.50, respectively, corresponding to the average numbers 9.4, 9.4, and 9.2 core haplotypes being identified among the 10 true cores. In addition, when p is fixed at 0.5, the percentages are 89%, 90%, and 89% for the additive, dominant, recessive model, respectively, implying robust performance of the evolutionary clustering procedure.
Figure 4. Average number of identified haplotypes under the additive model. The number of haplotypes identified in the simulations under various numbers of families (N), relative ratios (r), and liability allele frequencies (p) under additive models. Red lines are for N = 200 families, blue for N = 100 families, and pink for N = 50. Solid lines are for r = 2 and dashed lines for r = 3.
Figure 5. Average of percentage of identified core haplotypes. The average percentage of identified core haplotypes among the 10 true core haplotypes under different allele frequencies p. Red lines are for N = 200 families, blue for N = 100 families, and pink for N = 50. Solid lines are for r = 2 and dashed lines for r = 3.
After the core haplotypes are determined, the next step is to construct the likelihoods under the null and the alternative hypotheses, respectively, with all haplotype frequencies replaced by the revised core frequencies. This new modified likelihood ratio test with clustered haplotypes, LRTC, is compared with a nonparametric score test in FBAT, an LRT with all rare haplotypes grouped into a single class (LRTG), and an original test in FAMHAP. The resulting powers for N = 200 are displayed in Table 2. Under each setting, the largest power among the three tests is indicated in boldface. In most cases, LRTC is more powerful than the other three tests. In addition, the power gain from LRTC is substantially pronounced when the relative ratio r is large. The power of LRTG is between that of LRTC and the original FAMHAP most of the time, whereas FBAT is always the least powerful, except when the allele frequency p is 0.1 under the recessive model. Type I errors are presented in Table 3, under the null hypothesis that none of the SNPs (with allele frequency ranging from 0.05 to 0.5) nor the haplotypes (with frequency ranging from 0.01 to 0.16) is associated with the disease. Note that the type I errors of LRTC, LRTG and FAMHAP are all around nominal values; while that of FBAT is the smallest.
Table 2. Number are the power of four familybased association tests at 5% significance level with N = 200
Table 3. Type I errors of the four familybased association tests at the 5% significance level
The above Table 2 seems to indicate a poor relationship between the power performance and the allele frequency p = P(A) under the recessive model. For instance, when p = 0.1, the power under the additive and dominant model is satisfactory; while the power under the recessive model is only around 0.05. One possible explanation could be the influence from penetrance, prevalence, and p on P(AD), where P(AD) is the allele frequency among diseased individuals. Because P(AD) can be written as [f_{2 }. p^{2 }+ f_{1 }. p . (1  p)]/K (see Additional file 3 for details), its value is closer to P(A) under the recessive model. For example, p(A) = 0.1 corresponds to P(AD) = 0.11 when r = 2, which indicates that the allele A and disease status D are close to independent. In other words, it is unlikely in this case that the allele A is the susceptible marker and, therefore, the power becomes low. In contrast, under the additive model, P(AD) is 0.21 if r = 2 and p = 0.12. In this case, the allele A may be the susceptible gene and thus the power is larger. More details on the derivation of P(AD) can be seen in Additional file 3 and the comparison between P(AD) and P(A) is displayed in Figure 6. The black solid line in Figure 6 represents the case where the magnitudes of P(AD) and P(A) are equal.
Additional file 3. Derivation of P(AD).
Format: PDF Size: 13KB Download file
This file can be viewed with: Adobe Acrobat Reader
Figure 6. Plots of P(AD) versus p under three genetic models. The values of P(AD) correspond to different p 's under additive model (red line), dominant model (solid blue line), and recessive model (dashed blue line). The left panel is for r = 2 (Left) and the right for r = 3 (Right). The straight black line indicates when magnitudes of P(AD) and p are equal.
Performance evaluation under population admixture
Another issue of note concerns the effect of population stratification on the power of LRTC. To investigate this effect, we performed further simulation studies with data generated from two populations via GENOME. Similar to the procedures described above, we extracted tag SNPs with Haploview and selected a block with 13 tagSNPs to construct genotype data for N = 200 families with 100 families from each population. The liability allele frequency p was set at 0.085, 0.270 and 0.530, corresponding to allele frequencies (0.09, 0.43, 0.51) for the first population and (0.08, 0.11, 0.55) for the second. The number of children, penetrance function, relative ratio, genetic models and prevalence were the same as previously defined. Table 4 lists the power of the four association tests under the population admixture data, where LRTC is the most powerful and FBAT has the smallest power. This result demonstrates that the effect of population admixture on LRTC is much smaller than it is on either FAMHAP or TRANSMIT, whereas FBAT and the HAPTDT option in FAMHAP [14,17] can escape only partially from this admixture problem [18], especially when the mode of inheritance is not additive. In addition, the newly developed feature HAPTDT in FAMHAP does not allow missing data, which makes it less appealing than the original feature in FAMHAP where missing genotype data is allowed for family members.
Table 4. Numbers are the power of four familybased association tests for population stratification data at the 5% significance level with N = 200
Discussion
In this paper, we have constructed a familybased association test using clustered haplotypes. The four key steps are: (1) to determine the core set on the basis of haplotype frequencies, (2) to perform the clustering procedure based on a haplotype cladogram, (3) to represent the rare haplotype frequencies in terms of the revised core frequencies, and (4) to incorporate the phase ambiguity, transmission uncertainty, and corerepresentation variability via likelihood weights. Our simulations show that both haplotype reconstruction and core identification perform well with more than 91% accuracy for cases where number of families N ≥ 100. In addition, the results show great improvement in test power, as compared to the original FAMHAP, FBAT, and LRTG. Our proposed procedure is useful for long haplotypes containing many SNPs in LD. If the SNPs are not in LD, then it is not appropriate to consider haplotypes as the unit for analysis. In addition, if the haplotypes are of short length, then both the dimensionality and phase ambiguity will not be hard to handle. We have also embedded this clustering algorithm in the likelihood ratio test under FAMHAP, and this program can be downloaded freely from the author's website http://homepage.ntu.edu.tw/~ckhsiao/download(en).html webcite.
One issue that merits discussion concerns the number of haplotypes in the core set. As a rule of thumb, we selected the several leading haplotypes with 0.9 cumulative frequency. This choice is somewhat arbitrary. In fact, there is a tradeoff between the increase in information (represented via frequency) and the reduction in dimensionality. A possible alternative, depending on the sample size and number of dimensions under consideration, would be to use Shannon's information with a penalty function. This criterion works by finding l haplotypes such that Shannon's net information reaches its maximum. This criterion, however, is sensitive with respect to sample size. When sample size gets large, the penalty decreases faster than the entropy term, and thus results in inclusion of all haplotypes in the core set, even those with small frequencies. In other words, this criterion does not effectively reduce dimensionality when large sample size prevails.
There are several potential applications for the association test presented here. In many association tests, the chisquare approximation can be poor due to the existence of many haplotypes and/or rare haplotypes. Our clustering procedure may improve the performance of such statistical methods. Although we only demonstrate its impact on the likelihood ratio test, we believe other tests would benefit from this clustering procedure as well. For instance, after the haplotype phase and transmission status are identified and recoded via the core for each family, the tests in TRANSMIT or FBAT or other kinds of haplotype inference [1922] can be modified. Another application is to use the clustered haplotypes in regression analysis to incorporate the environmental influence on quantitative traits [23,24]. Because FAMHAP provides for each family and its individual family members the list of all possible haplotype explanations with corresponding likelihoods, this information can be utilized in further analyses. In addition, it would be interesting to extend this clustering approach to data containing both independent and related individuals. A mixture of familybased and populationbased data are sometimes considered in metagenetic analysis to enhance the power of an association study. Applying this clustering technique can further reduce the dimension of parameters and achieve a larger power in detection of genetic association with common diseases. Finally, we would like to point out that we consider in this paper only simulations from one single population or from a mixture of two populations. Though the results look promising, other scenarios are warrant for further investigation.
Conclusions
For family genotype data, we consider an evolutionguided clustering tool that clusters rare haplotypes in order to achieve dimensional reduction, and a parametric likelihood ratio test that accounts for the uncertainty associated with transmission phase. This procedure is able to preserve biological information and to improve statistical testing power. Simulation studies of long haplotypes with SNPs in LD show that the proposed likelihood ratio test with clustered haplotypes (LRTC) outperforms FAMHAP, FBAT, and a naïve LRTG.
Authors' contributions
MHL implemented the algorithm, performed the simulations, and drafted the manuscript. JYT suggested the evolution idea in familybased association study and revised the manuscript. SYH interpreted the concept of likelihood ratio tests and revised the manuscript. CKH conceived the research, supervised the study and finalized the manuscript. All authors read and approved the final manuscript.
Acknowledgements
MHL was supported in part by NSC 992118M113001. JYT was supported by NIH grants R01MH84022 and P01CA142538. SYH, and CKH were supported in part by NSC 972314B002040MY3.
References

Horvath S, Xu X, Lake SL, Silverman EK, Weiss ST, Laird NM: Familybased tests for association haplotypes with general phenotype data: application to asthma genetics.
Genet Epidemiol 2004, 26:6169. PubMed Abstract  Publisher Full Text

Becker T, Knapp M: Maximumlikelihood estimation of haplotype frequencies in nuclear families.
Genet Epidemiol 2004, 27:2132. PubMed Abstract  Publisher Full Text

Herold C, Becker T: Genetic association analysis with FAMHAP: a major program update.
Bioinformatics 2009, 25:134136. PubMed Abstract  Publisher Full Text

Schaid DJ, Rowland CM, Tines DE, Jacobson RM, Poland GA: Score tests for association between traits and haplotypes when linkage phase is ambiguous.
Am J Hum Genet 2002, 70:425434. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Huang BE, Amos CI, Lin DY: Detecting haplotype effects in genomewide association studies.
Genet Epidemiol 2007, 31(8):803812. PubMed Abstract  Publisher Full Text

Tzeng JY: Evolutionarybased grouping of haplotypes in association analysis.
Genet Epidemiol 2005, 28:220231. PubMed Abstract  Publisher Full Text

Crandall KA, Templeton AR: Empirical tests of some predictions from coalescent theory with applications to intraspecific phylogeny reconstruction.
Genetics 1993, 134:959969. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Slatkin M, Rannala B: Estimating the age of alleles by use of intraallelic variability.
Am J Hum Genet 1997, 60:447458. PubMed Abstract  PubMed Central Full Text

Seltman H, Roeder K, Devlin B: Transmission/disequilibrium test meets measured haplotype analysis: familybased association analysis guided by evolution of haplotypes.
Am J Hum Genet 2001, 68:12501263. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Seltman H, Roeder K, Devlin B: Evolutionarybased association analysis using haplotype data.
Genet Epidemiol 2003, 25:4858. PubMed Abstract  Publisher Full Text

Liang L, Zöllner S, Abecasis GR: Genome: a rapid coalescentbased whole genome simulator.
Bioinformatics 2007, 23:15651567. PubMed Abstract  Publisher Full Text

Clayton D: A generalization of the transmission/disequilibrium test for uncertainhaplotype transmission.
Am J Hum Genet 1999, 65:11701177. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Tzeng JY, Wang CH, Kao JT, Hsiao CK: Regressionbased association analysis with clustered haplotypes using genotypes.
Am J Hum Gene 2006, 78:231242. Publisher Full Text

Knapp M, Becker T: Familiybased association analysis with tightly linked markers.
Hum Hered 2003, 56:29. PubMed Abstract  Publisher Full Text

Laird NM, Horvath S, Xu X: Implementing a unified approach to familybased tests of association.
Genet Epidemiol 2000, 19(Suppl 1):S36S42. PubMed Abstract

Barrett JC, Fry B, Maller J, Daly MJ: Haploview: analysis and visualization of LD and haplotype maps.
Bioinformatics 2005, 21:263265. PubMed Abstract  Publisher Full Text

Zhao H, Zhang S, Merikangas KR, Trixler M, Wildenauer DB, Sun F, Kidd KK: Transmission/disequilibrium tests using multiple tightly linked markers.
Am J Hum Genet 2000, 67:936946. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Zheng G, Freidlin B, Gastwirth JL: Robust TDTtype candidategene association tests.
Ann Hum Genet 2002, 66:145155. PubMed Abstract  Publisher Full Text

Li C, Boehnke M: Haplotype association analysis for late onset diseases using nuclear family data.
Genet Epidemiol 2006, 30:220230. PubMed Abstract  Publisher Full Text

Ding XD, Simianer H, Zhang Q: A new method for haplotype inference including fullsib information.
Genetics 2007, 177:19291940. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Dudbridge F: Likelihoodbased association analysis for nuclear families and unrelated subjects with missing genotype data.
Hum Hered 2008, 66:8798. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Chen Y, Li X, Li J: A novel approach for haplotypebased association analysis using family data.
BMC Bioinformatics 2010, 11(Suppl 1):S45. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Li J, Zhou Y, Elston RC: Haplotypebased quantitative trait mapping using a clustering algorithm.
BMC Bioinformatics 2006, 7:258. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Cai Z, Sabaa H, Wang Y, Goebel R, Wang Z, Xu J, Stothard P, Lin G: Most parsimonious haplotype allele sharing determination.
BMC Bioinformatics 2009, 10:115. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text