Abstract
Background
Dense marker maps require efficient statistical methods for QTL fine mapping that work fast and efficiently with a large number of markers. In this study, the simulated dataset for the XIIth QTLMAS workshop was analyzed using a QTL fine mapping set of tools.
Methods
The QTL finemapping strategy was based on the use of statistical methods combining linkage and linkage disequilibrium analysis. Variance component based linkage analysis provided confidence intervals for the QTL. Within these regions, two additional analyses combining both linkage analysis and linkage disequilibrium information were applied. The first method estimated identitybydescent probabilities among base haplotypes that were used to group them in different clusters. The second method constructed haplotype groups based on identitybystate probabilities.
Results
Two QTL explaining 9.4 and 3.3% of the genetic variance were found with high significance on chromosome 1 at positions 19.5 and 76.6 cM. On chromosome 2, two QTL were also detected at positions 26.0 and 53.2 explaining respectively 9.0 and 7.8 of total genetic variance. The QTL detected on chromosome 3 at position 11.9 cM (5% of variance) was less important. The QTL with the highest effect (37% of variance) was detected on chromosome 4 at position 3.1 cM and another QTL (13.6% of variance) was detected on chromosome 5 at position 93.9 cM.
Conclusion
The proposed strategy for finemapping of QTL combining linkage and linkage disequilibrium analysis allowed detecting the most important QTL with an additive effect in a short period but it should be extended in the future in order to finemap linked and epistatic QTL.
Background
Highthroughput SNP analysis and SNP microarrays now give the opportunity to genotype many animals for hundreds of SNP per chromosome. Thanks to these techniques, marker density is no longer a limiting factor in QTL finemapping studies. However, these dense marker maps require statistical methods that work fast and efficiently with a large number of markers.
The purpose of this paper was to present a strategy for QTL finemapping and its corresponding results on the XIIth QTLMAS workshop simulated dataset.
Methods
The QTL finemapping strategy was mostly based on the use of statistical methods combining linkage (LA) and linkage disequilibrium analysis (LDLA) described by Druet et al. [1].
Linkage analysis
First, a variance componentbased (VC) linkage analysis [2] was performed at each marker position with the following model:
where y is a vector containing the phenotypic values for bulls, μ is the mean, u is a vector of random polygenic effects, v is a vector of random gametic effects and e is a vector of random residual terms. Z and Z_{v }are known design matrices relating the results to random polygenic and gametic effects, respectively.
The (co)variance structure was:
where R is a diagonal matrix containing the residual variance (). A is the additive relationship matrix and is the polygenic variance. G_{v }is the relationship matrix among QTL allelic effects estimated due to relationships and marker information [3] and is the gametic variance. As in PongWong et al. [4], the method for calculating the gametic matrix used the closest informative bracket instead of estimating probabilitiesofdescent of a gamete (PDQ) from parent to offspring by integration over all possible haplotypes. Rules to compute the PDQ using the closest informative bracket can be found in Table 1 in PongWong et al. [4]. The variances of paternal and maternal alleles were assumed to be equal and a single parameter was estimated (). Then, variance associated to the QTL (QTL allelic variance) was twice . The proportion of total genetic variance due to the QTL was
Table 1. Position (and confidence interval) of the QTL inferred in the different chromosomes with LA, LDLA HAP3 and LDLA IBD10 models and percentage of genetic variance explained by the QTL at the selected position
Genetic parameters were estimated after maximizing likelihoods with an AIREML approach. The BLUPF90 software [5] was modified by Druet et al. [1] to incorporate relationship matrices among QTL allelic effects.
The likelihood ratio test statistic considered variance components as parameters and was used to confirm whether there was a QTL present at the studied position [2]:
where L(H_{0}) and L(H_{1}) are the values of the likelihood functions estimated by REML under the polygenic model with no QTL fitted and the model with QTL respectively. The distribution of the test is a mixture of zero and 1d.f. chisquare for a single position [6]. For the analyses where a significant QTL was detected, a 2LODdropoff support interval was constructed for the position of the QTL, i.e., the interval surrounding the QTL peak where the likelihood exceeds ln L_{max }2ln(10), where ln L_{max }is the natural logarithm of the maximum likelihood [7].
Combined linkage disequilibrium and linkage analysis (LDLA)
Linkage analysis gives an interval region for the QTL. Within this region, QTL fine mapping with LDLA was applied based on an approach derived from the method proposed by Meuwissen and Goddard [8]. It consists of a VC mapping method that includes information from linkage disequilibrium between base haplotypes in the construction of the relationship matrix among estimated QTL allelic effects (see above). Chromosomes of founders were considered as base haplotypes. At each tested position the following procedure was applied:
1. PDQ probabilities were computed to determine to which base haplotype an inherited chromosome corresponded. Rules to compute the PDQ using the closest informative bracket [4] were the same as those used in linkage analysis. LD information was not taken into account at this step.
2. Identitybydescent (IBD) probabilities (Φ_{p}) were estimated among each pair of base haplotypes conditionally on the identitybystate (IBS) status of the neighboring markers using windows of 10 flanking markers [7].
3. Base haplotypes were grouped with a clustering algorithm with SAS^{® }proc CLUST using (1Φ_{p}) as a distance measure. Base haplotypes were grouped if Φ_{p }exceeded 0.50 [1]. Indeed, Ytournel (personal communication) showed that most haplotypes were IBD as soon as their estimated IBD probability exceeded 0.5. Chromosomes were also grouped within the clusters if i) the two chromosomes of a sire were grouped in the same cluster (the paternally inherited chromosomes of all his sons were then grouped in this cluster) or ii) a chromosome could be associated to a base haplotype with a probability larger than 0.95 (it was grouped to the corresponding cluster).
4. A model similar to the linkage analysis model was then applied:
where h is a vector of random QTL effects corresponding to the haplotype clusters and Z_{h }is a design matrix relating phenotypes to corresponding haplotype clusters. IBD10 will be the notation for this model.
In addition, a similar model with the following new rules was applied: 1) all chromosomes were considered as base haplotypes; 2) smaller marker windows were used (3 markers) in order to obtain a limited number of clusters and 3) haplotype groups were no longer constructed based on IBD probabilities but on IBS status (if haplotypes were IBS for all markers they were grouped together). This method evaluates whether an effect can be associated to a small haplotype covering a small region. HAP3 will be used to refer to this model.
Results
The analysis was performed on a 64bit IBM AIX 5.2.0 server with power4+ processor and 62 Gb RAM. The LA, IBD10 and HAP3 methods needed respectively 16.4, 17.2 and 445 seconds of CPU time for each marker. These numbers should be multiplied by the number of marker positions tested. Here, it is important to note that marker information for the first two generations was discarded in our study in order to reduce the total time required for the analysis. Haplotypes for these animals were reconstructed by working with dense marker maps using a program developed by Druet et al. [1].
The estimated total genetic variance of the trait was 1.32 and the heritability was 0.30. The locations of inferred QTL using the LA, IBD10 and HAP3 methods are shown in Table 1. IBD10 and HAP3 methods give several peaks with LRT higher than for linkage analysis. The use of the haplotypes of heterozygous sires at the QTL offered the possibility to give confidence to some of them. A QTL explaining 9.4% of the genetic variance was found with high significance in chromosome 1 at position 19.5 cM (Figure 1). In chromosome 2, the main QTL was detected at position 26.0 cM which explained 9.0% of total genetic variance (Figure 2). The QTL detected in chromosome 3 at position 11.9 cM was less important (5% of variance) (Figure 3). The QTL with the highest effect (37% of variance) was detected in chromosome 4 at position 3.1 cM (Figure 4) and another QTL (13.6% of variance) was detected in chromosome 5 at position 93.9 cM (Figure 5). No QTL was detected in chromosome 6.
Figure 1. LA and LDLA curves obtained on chromosome 1. LA curve (black), LDLA curve with model HAP3 (red) and LDLA curve with model IBD10 (blue).
Figure 2. LA and LDLA curves obtained on chromosome 2. LA curve (black), LDLA curve with model HAP3 (red) and LDLA curve with model IBD10 (blue).
Figure 3. LA and LDLA curves obtained on chromosome 3. LA curve (black), LDLA curve with model HAP3 (red) and LDLA curve with model IBD10 (blue).
Figure 4. LA and LDLA curves obtained on chromosome 4. LA curve (black), LDLA curve with model HAP3 (red) and LDLA curve with model IBD10 (blue).
Figure 5. LA and LDLA curves obtained on chromosome 5. LA curve (black), LDLA curve with model HAP3 (red) and LDLA curve with model IBD10 (blue).
These results were based on a model assuming a single QTL per chromosome. However, in order to test whether there was another QTL present in a chromosome, the LA model was extended with a second QTL effect. This model allowed finding a second QTL with high significance at positions 76.6 cM in chromosome 1 and 53.2 cM in chromosome 2 explaining respectively 3.3 and 7.8% of the genetic variance.
Discussion
Our QTL finemapping strategy was mostly based on the use of statistical methods combining linkage (LA) and linkage disequilibrium analysis (LDLA) described by Druet et al. [1]. Linkage analysis provides a LRT peak and a confidence interval for the location of the QTL in each chromosome. The use of a high density marker map resulted in almost optimal genetic information along the whole chromosome. In consequence, sharp and high LRT curves were obtained. With this density of markers, QTL transmission is followed more precisely and locations of recombinations are determined within smaller intervals allowing an almost perfect achievement of the pedigree linkage mapping resolution. The method used for LDLA analysis was based on LDLA methods proposed by Meuwissen and Goddard [7]. Despite the fact that the LDLA analysis did not result in a single peak, it improved strongly the information on the QTL location with respect to the LA analysis. Indeed, many regions could be discarded according to the LDLA analysis because QTL alleles of opposite effects were grouped in the same cluster. The LDLA analysis discarded regions where heterozygous sires did not share common haplotypes. As a consequence, the possible location of the QTL is confined to a few small intervals. The HAP3 and IBD10 models have some complementary properties. First, HAP3 searches for small informative regions of 3 markers in LD with the QTL. The IBD10 method uses IBD probabilities and uses a large marker window. Therefore, it helps to discard regions that were identical for three markers by chance from regions where haplotypes were grouped because they have high IBD probabilities. However, IBD10 will be more sensitive to missing information or to genetic map inconsistencies.
This strategy was thought to detect the most important QTL with an additive effect for an important number of traits in a relatively short time period. The advantage of our strategy was that it allows reducing the number of regions to be analyzed using LDLA methods which provided sharper and higher LRT peaks than other available methods. It will be applied to the analysis of 60,000 SNP data of 3300 bulls for 15 traits from May 2008 to July 2008. The aim of this analysis is to select 1,500 SNP in LD with QTL so they can be used for routine marker assisted selection (MAS). In this simulation study, the position of the main simulated QTL from each chromosome was correctly estimated, in spite of its effect (measured as a proportion of genetic variance) was sometimes over or under estimated. Alternatively, in the chromosomes where several QTL located in different positions were affecting the trait, finemapping of the linked QTL using linkage analysis was not very efficient and more complex methods such as multiQTL LDLA finemapping methods [9,10] are needed. The programs should also be extended in order to detect possible epistatic loci.
Conclusion
The proposed strategy for finemapping of QTL using a dense SNP map worked relatively fast with a large number of markers. The linkage analysis approach provides a confidence interval for the QTL. Within these intervals, the QTL position is finemapped applying two variance component approaches combining both linkage analysis and linkage disequilibrium information. This strategy allows detecting the most important QTL with an additive effect for an important number of traits in a short time period but it should be extended for finemapping linked and epistatic QTL.
List of abbreviations used
IBD: Identitybydescent; IBS: Identitybystatus; LA: Linkage analysis; LD: Linkage disequilibrium; LDLA: Linkage disequilibrium and linkage analysis; QTL: Quantitative trait loci; SNP: Single nucleotide polymorphism; VC: Variance components.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
JT: Performed the statistical analysis and drafted the manuscript. FG: Performed the statistical analysis. SF: Performed the statistical analysis. All authors read and approved the final manuscript.
Acknowledgements
This article has been published as part of BMC Proceedings Volume 3 Supplement 1, 2009: Proceedings of the 12th European workshop on QTL mapping and marker assisted selection. The full contents of the supplement are available online at http://www.biomedcentral.com/17536561/3?issue=S1.
References

Druet T, Fritz S, Boussaha M, BenJemaa S, Guillaume F, Derbala D, Zelenika D, Lechner D, Charon C, Boichard D, Gut IG, Eggen A, Gautier M: Fine mapping of QTLs affecting female fertility in dairy cattle on BTA03 using a dense SNP map.
Genetics 2008, 178:22272235. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

George AW, Visscher PM, Haley CS: Mapping quantitative trait loci in complex pedigrees: a twostep variance component approach.
Genetics 2000, 156:20812092. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Fernando R, Grossman M: Marked assisted selection using best linear unbiased prediction.
Genet Sel Evol 1989, 21:467477. Publisher Full Text

PongWong R, George AW, Woolliams JA, Haley CS: A simple and rapid method for calculating identitybydescent matrices using multiple markers.
Genet Sel Evol 2001, 33:453471. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Misztal I, Tsuruta T, Strabel T, Auvray B, Druet T: BLUPF90 and related programs (BGF90).
Proceedings of the Seventh World Congress on Genetics Applied to Livestock Production, Montpellier, France. Communication no. 2807 2002.

Grignola FE, Hoeschele I, Zhang Q, Thaller G: Mapping quantitative trait loci in outcross populations via residual maximum likelihood. II. A simulation study.
Genet Sel Evol 1996, 28:491504. Publisher Full Text

Meuwissen TH, Goddard ME: Prediction of identity by descent probabilities from markerhaplotypes.
Genet Sel Evol 2001, 33:605634. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Meuwissen TH, Goddard ME: Fine mapping of quantitative trait loci using linkage disequilibria with closely linked marker loci.
Genetics 2000, 155:421430. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Meuwissen TH, Goddard ME: Mapping multiple QTL using linkage disequilibrium and linkage analysis information and multitrait data.
Genet Sel Evol 2004, 36:261279. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Olsen HG, Lien S, Gautier M, Nilsen H, Roseth A: Mapping of a milk production quantitative trait locus to a 420kb region on bovine chromosome 6.
Genetics 2005, 169:275283. PubMed Abstract  Publisher Full Text  PubMed Central Full Text