Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

MetaQTL: a package of new computational methods for the meta-analysis of QTL mapping experiments

Jean-Baptiste Veyrieras1*, Bruno Goffinet2 and Alain Charcosset1

Author Affiliations

1 UMR, INRA UPS-XI INAPG CNRS Génétique Végétale, Ferme du Moulon, 91190 Gif-sur-Yvette, France

2 BIA, Chemin de Borde Rouge BP27 31326, Castanet Tolosan Cedex, France

For all author emails, please log on.

BMC Bioinformatics 2007, 8:49  doi:10.1186/1471-2105-8-49


The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2105/8/49


Received:14 November 2006
Accepted:8 February 2007
Published:8 February 2007

© 2007 Veyrieras et al; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

Integration of multiple results from Quantitative Trait Loci (QTL) studies is a key point to understand the genetic determinism of complex traits. Up to now many efforts have been made by public database developers to facilitate the storage, compilation and visualization of multiple QTL mapping experiment results. However, studying the congruency between these results still remains a complex task. Presently, the few computational and statistical frameworks to do so are mainly based on empirical methods (e.g. consensus genetic maps are generally built by iterative projection).

Results

In this article, we present a new computational and statistical package, called MetaQTL, for carrying out whole-genome meta-analysis of QTL mapping experiments. Contrary to existing methods, MetaQTL offers a complete statistical process to establish a consensus model for both the marker and the QTL positions on the whole genome. First, MetaQTL implements a new statistical approach to merge multiple distinct genetic maps into a single consensus map which is optimal in terms of weighted least squares and can be used to investigate recombination rate heterogeneity between studies. Secondly, assuming that QTL can be projected on the consensus map, MetaQTL offers a new clustering approach based on a Gaussian mixture model to decide how many QTL underly the distribution of the observed QTL.

Conclusion

We demonstrate using simulations that the usual model choice criteria from mixture model literature perform relatively well in this context. As expected, simulations also show that this new clustering algorithm leads to a reduction in the length of the confidence interval of QTL location provided that across studies there are enough observed QTL for each underlying true QTL location. The usefulness of our approach is illustrated on published QTL detection results of flowering time in maize. Finally, MetaQTL is freely available at http://bioinformatics.org/mqtl webcite.

Background

In the last two decades, the advent of molecular markers and their use in linkage mapping experiments has tremendously increased the potential of quantitative genetics. Linkage mapping experiments now provide an efficient tool to identify regions of the genome where polymorphism affects the variation of quantitative traits, called Quantitative Trait Loci (QTL). Although a large number of advanced statistical methods have been developed to improve the localization of QTL, the limited number of recombination events available in routinely used pedigree designs for QTL mapping lead essentially to an approximate mapping of the QTL (see for instance [1]). This is mainly due to both a few mating generations and a restricted number of sampled individuals (generally a few hundreds). From results of QTL experiments gathered over a wide range of plant species, [2] have shown that confidence intervals around most likely QTL positions are, on average, approximately 10 cM, which usually includes several hundreds of genes. More recent advents in the area of molecular biology have allowed researchers to carry out positional cloning of QTL (see for instance the review of [3]) but this approach still remains extremely expensive both in terms of time and resources. Also several authors [2,4] have pointed out that QTL detection is statistically biased both in the true number of QTL, which is underestimated since only QTL with large effects are detected, and in the QTL effects which are over estimated as only significant effects are reported (a phenomenon has commonly referred to as the Beavis effect [5]). Even though QTL mapping experiments must be considered with an awareness of these limitations, they have become commonplace and have greatly improved our knowledge about the genetic component of complex traits.

Since the first publication of a QTL localization using molecular data [6], more and more species and traits have been studied and many of these results has been made available via public databases. One of the main purposes of these databases was to help researchers to compare results from different QTL studies, to study the congruency of QTL locations in order to address the following question: "do QTL identified for a given trait in a population correspond to those detected in other populations ?". In theory one would expect that the variation of a quantitative trait within a species is explained by a finite number of genes. Thus QTL congruency investigation will be a relevant approach to improve knowledge on trait genetics and several publications have pointed out its usefulness [7-12]. Nevertheless, the combination of results from linkage studies can be tedious since, even if several studies focus on the same trait within the same species, family structures, sample sizes, marker maps, or QTL detection methods may differ between studies. Some methods have been recently developed to tackle the issues raised by between QTL studies heterogeneity. Integration of genetic maps and QTL locations by iterative projections on a reference map is now widely used to position both markers and QTL on a single and homogeneous consensus map (see for instance [13]). However this process yields a consensus marker map for which both the statistical properties and biological "reality" can't be clearly assessed, even if a robust ordered marker map was used as reference. [14] proposed an original approach using graph theory to integrate various types of maps (genetic, physical or sequence-based) but it mainly dealt with dissection of marker order inconsistencies between maps. From up to now it seems that there is no efficient methodological framework to build reliable consensus marker maps on which markers and candidate genes from different mapping experiments can be both ordered and positioned (except by merging raw mapping data from multiple populations as proposed by [15] and [16]).

In order to study QTL congruency, [17] proposed an original approach based on a meta-analysis strategy. Meta-analysis, which is mainly used in medical, social, and behavioral sciences, aims to pool results across independent studies in order to combine them in a single result or estimate. The relevance of meta-analysis investigations in genetics and evolution has been discussed and pointed out by several authors in the last decade (see for instance [18-21]). More recently [22] developed another meta-analysis based approach to overcome the between-study heterogeneity and to refine both QTL location and the magnitude of the genetic effects. Yet both the method of [17] and [22] are limited to a small number of underlying QTL positions (from one to four for the former and only one for the later) which is a serious limitation for a whole genome study of QTL congruency. Even if the average number of QTL per experiment is around four in plants [2,12], one would expect that more than four genes can be involved in the trait variation on a single chromosome.

To remove these impediments we have developed a new 2-stage meta-analysis procedure in order to integrate multiple independent QTL mapping experiments. Our aim was to create a global framework to evaluate the homogeneity of both genetic marker and QTL mapping results from literature and public databases. The first part of our meta-analysis procedure consists in building a consensus genetic marker map that takes into account the statistical properties of genetic distance estimates using a Weighted Least Squares (WLS) strategy. Secondly, once the consensus marker map has been built, the QTL locations can be projected on to the map. We also propose a new clustering algorithm based on a Gaussian mixture model in order to identify the number of underlying QTL which best explain the observed distribution of QTL positions in the mapping experiments. As it has been emphasized by [17], the crucial point at this step is to find an unbiased criterion to select the correct number of QTL. In the context of Gaussian mixture, a large variety of model choice criteria have been reported in the literature. We explore, by means of simulations, the properties of some of these criteria for our particular mixture model. These new methods have been implemented into a Java package called MetaQTL. Finally, as an example, we applied our new approach to QTL detection results gathered for flowering time in maize.

Results

Meta-analysis of genetic maps

Input genetic map information

Consider a set of n genetic mapping experiments concerning the same linkage group. These different experiments may involve different kinds of population pedigree. We consider that for each experiment i = 1, ..., n only the estimated distances between ordered markers along the linkage group are available. We use ci, Ni, Mi to denote the population cross design, the population size and the number or markers on the ith genetic map, respectively. Let's suppose that two markers mj and mk have been positioned on the ith map, <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M1">View MathML</a>i,jk stands for the estimated recombination rate between markers mj and mk and <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M2">View MathML</a>i,jk = f[<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M1">View MathML</a>i,jk] the corresponding estimated distance, where f is the mapping function which is assumed to be the same in the n mapping experiments (without loss of generality). Applying the classical asymptotic Gaussian distribution of the maximum-likelihood estimation of the parameter, we assume that the <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M1">View MathML</a>i,jk are normally distributed around the true recombination rate ri,jk between markers mj and mk with a variance var(<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M1">View MathML</a>i,jk) = <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M3">View MathML</a>. This variance <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M3">View MathML</a> depends on the cross design ci, the value of ri,jk, the sample size Ni and the amount of information supplied by the marker pair mj and mk sampled population (see Additional File 1 for expression of η).

Additional file 1. Theory and method for the computation of the variance of the recombination rate estimate. This PDF file contains a short review of the theory and the methods to compute the variance of the recombination rate estimator for different kinds of pairwise marker configurations and different types of mapping experiments.

Format: PDF Size: 65KB Download file

This file can be viewed with: Adobe Acrobat ReaderOpen Data

Since mapping functions are generally one to one functions, the functional invariance property of the maximum-likelihood estimate can be applied. Thus <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M2">View MathML</a>i,jk is also normally distributed around the true distance denoted di,jk = f[ri,jk]. To obtain the variance of <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M2">View MathML</a>i,jk, denoted <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M4">View MathML</a>, we use the first term of the Taylor expansion of the inverse of the mapping function leading to the approximation:

<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M5','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M5">View MathML</a>

Now suppose the n experiments are consistent with the following assumptions:

• Assumption 1 : they come from independent population samples. This implies that cov(<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M1">View MathML</a>i,jk, <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M1">View MathML</a>i',jk) = 0 and cov(<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M2">View MathML</a>i,jk, <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M2">View MathML</a>i',jk) = 0 for any pair of markers mj and mk which have been mapped in population i and i', i i' and (i, i') ∈ [1..n]2 .

• Assumption 2 : there is no interference, i.e in each mapping experiment the recombination events occur independently in each marker interval. This is surely an idealization, but presently, most of the statistical models used to build genetic marker maps are based on this assumption. Thus for a given mapping experiment i, both the ordered marker interval recombination rate and distance estimates are independent, i.e cov(<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M1">View MathML</a>i,j(j+1), <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M1">View MathML</a>i,(j+1)(j+2)) = 0 and cov(<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M2">View MathML</a>i,j(j+1), <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M2">View MathML</a>i(j+1)(j+2)) = 0 for i ∈ [1,..., n] and j ∈ [1,..., Mi - 2].

• Assumption 3 : the "true" marker order and recombination rate are supposed to be the same in the different populations, i.e ri,jk = ri',jk if markers mj and mk have been mapped in population i and i', i i' and (i, i') ∈ [1..n]2.

• Assumption 4 : all the genetic maps are connected. Mathematically, this means that if we consider maps as vertices and common markers as edges, then the corresponding graph is supposed to be connected.

Meta-analysis model

We define <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M6">View MathML</a> = (<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M2">View MathML</a>i,jk) and Γ = diag(<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M4">View MathML</a>) the vector of ordered marker interval distance estimates and the diagonal terms of the variance covariance matrix of <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M6">View MathML</a>. We assume that a total of M distinct markers have been mapped in the n populations. The aim of the meta-analysis is to combine all the available information on marker order and positions in order to build a consensus linkage group on which the M markers are positioned. To do so we introduce Y = (y1,..., yM) the vector of the "true" positions of these M markers on the consensus linkage group, where the yi's can be either positive or negative depending on an arbitrary zero-reference on the chromosome (hereafter we suppose y1 = 0). If the n mapping experiments are consistent with the previous assumptions and assuming that the distances on the linkage group are additive we propose to estimate Y by solving the following linear system:

<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M2">View MathML</a>i,jk = yk - yj + εi,jk

where (j, k) ∈ [1,...,M]2, i ∈ [1,..., n], <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M2">View MathML</a>i,jk is the distance estimate of the interval between marker mj and mk consecutive on the ith experiment, yk - yj is the true distance between these markers, εi,jk ~ <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M7">View MathML</a>(0, <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M4">View MathML</a>) is the error term. If assumption 4 holds we are ensured that this system has at least one solution. Applying a classical weighted least squares (WLS) strategy, the optimal solution is the one which minimizes the target function,

<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M8','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M8">View MathML</a>

Let's introduce the design matrix A such that χ = T(<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M6">View MathML</a> - AY-1(<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M6">View MathML</a> - AY). Then the value of Y which minimizes χ is given by:

<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M9','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M9">View MathML</a> = (TAΓ-1 A)-1(TAΓ-1 <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M6">View MathML</a>)

which is also a maximum-likelihood estimation of Y with variance-covariance matrix given by (TAΓ-1 A)-1. Thus <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M9','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M9">View MathML</a> gives both the marker positions and the marker order along the consensus linkage group. The goodness-of-fit of the model can be evaluated by the means of a chi-square test as χ ~ <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M10','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M10">View MathML</a> where q is the length of the vector <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M6">View MathML</a>, i.e the number of marker intervals over the n experiments. As an illustration, let's consider the following idealized scenario : suppose that the n gathered genetic maps share the same markers, i.e Mi = M for i = 1,..., n. In this simple case the computation of <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M9','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M9">View MathML</a> is straightforward:

<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M11','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M11">View MathML</a>

and χ is the sum of M - 1 terms each distributed as a chi-square with n - 1 degree of freedoms. This is equivalent to for each marker intervals testing if the distances are homogeneous between populations using a classical test of equal means. This trivial example illustrates how our WLS approach can be used to test for the homogeneity of the recombination rate between several mapping experiments. This can be viewed as an alternative to the M-test devised by [23] when raw data are not available.

Meta-analysis of QTL

Input QTL map information

Suppose that for a given trait a QTL detection has been carried out in the n mapping experiments. The minimal information supplied by the ith QTL experiment consists of a set of estimated positions of the QTL, denoted <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M12">View MathML</a>ij, and the corresponding proportion of variance explained by each QTL, the r-squares values λij. Here j ∈ [1,..., qi] and qi is the number of QTL detected in the ith mapping experiment on the current linkage group (generally qi = 1 or possibly 2). The confidence intervals (CI) of the <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M12">View MathML</a>ij's, denoted vij, can also be reported. The construction of the CI may have been performed by different approaches:

• support interval [24,25].

• likelihood method [26].

• bootstrapping [27,28].

When the CI is not available it is possible to obtain an approximation of the CI by applying the empirical formula proposed by [29]. By means of intensive simulations they showed that for either a backcross or a F2 population the expected CI, at 95% level, can be expressed as CI(95) ≈ 530/(Nλ) where N is the population size and λ the proportion of variance explained by the QTL. More recently [30] have derived simple analytical equations which are in good agreement with the formula of [29].

Whatever the method used to estimate the uncertainty on the QTL locations, we assume that the <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M12">View MathML</a>ij's are normally distributed around the true position xij of the jth QTL: <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M12">View MathML</a>ij ~ <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M7">View MathML</a>(xij, <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M13','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M13">View MathML</a>) where <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M13','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M13">View MathML</a> is the variance of the estimated position which can be deduced from the confidence interval vij. For a CI of β% (β depends on the method used to compute the CI), the standard deviation σij can be estimated as σij = vij/(2uβ) where uβ is the double-sided β-percentile of a centered normalized gaussian. This Gaussian approximation based on the classical asymptotic theory has been suggested by [17], even though this is not perfectly correct for QTL with small effects [31].

Furthermore the n QTL mapping experiments are assumed to be consistent with the following assumptions:

• Assumption 1 : they are independent. This can be considered as correct when the individuals measured in the different populations have been generated independently. Independence between experiments i and i' means independence between <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M12">View MathML</a>ij and <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M12">View MathML</a>i'j.

• Assumption 2 : for a given trait there is a finite number of underlying QTL which cosegregate in the mapping experiments: this means that the populations share the same trait determinism with potentially different allelic configurations at the QTL. In other word there is a finite number of true QTL positions on the linkage groups, i.e {xij|(i, j) ∈ [1,..., n] × [1,..., qi]} can potentially contain redundancy.

In addition to the two previous assumptions we also assume that the detected QTL locations are independent within experiments. This is not really true when the QTL detection does not properly take into account linked QTL. But with the advent of composite interval mapping strategy [32,33] multiple-QTL model can now be fitted by adding properly chosen cofactors which limit the impact of linkage between QTL on the position estimates. Therefore we assume that <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M12">View MathML</a>ij and <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M12">View MathML</a>ij' are independent for all j j'.

Pre-processing

The first step is to apply our WLS strategy to the n mapping experiments in order to build a consensus linkage group. Then the QTL locations are projected on the consensus linkage group using a simple scaling rule between the original QTL flanking marker interval and the corresponding one on the consensus chromosome. For a given QTL location the new confidence interval (if available) on the consensus linkage group is computed by taking into account the average scaling between the original and the consensus chromosome. This is done by computing the sum over the common marker intervals of the ratio of the interval lengths weighted by the probability that the QTL position lies in this interval. There are two possible strategies to approximate this probability. The first one relies on a rough approximation using a Gaussian distribution around the most likely position <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M12">View MathML</a>ij of the jth QTL, namely Pr(QTL j in m) = <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M14','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M14">View MathML</a> where φ[u]is the density function of a centered normalized Gaussian distribution, m is the index of the marker interval, um and um+1 are the absolute positions of the flanking markers on the original map of total length L. If the LOD score profile is available, a more accurate strategy can be applied by substituting φ for the density function which best fits the profile.

The meta-analysis model

The purpose of the QTL meta-analysis is to evaluate, for a given trait, the degree of congruency of the QTL detected in the n mapping experiments. By assuming that there is a finite number of true QTL locations, [17] proposed a clustering based approach to both classify the observed QTL and estimate the positions of the underlying QTL. Their method proceeds by testing all the possible QTL combinations and then choosing the one which maximizes a penalized log-likelihood. Although interesting, this method suffers from a categorical repartition of the QTL in the clusters, which is a limit case of Gaussian mixture models. We propose to adopt a similar clustering strategy but with a more standard Gaussian mixture model which allows QTL to be probabilistically distributed into clusters.

In order to lighten the notation we denote by q the total number of observed QTL locations and we ignore the mapping experiment subscripts so that <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M15">View MathML</a> = (<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M12">View MathML</a>1,..., <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M12">View MathML</a>q) and Σ = (σ1, ...,σq). Then, let's suppose there are K ≥ 1 true QTL located at <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M16','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M16">View MathML</a> which segregate in at least one of the n QTL mapping experiments. Since the QTL position estimates <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M15">View MathML</a> are normally distributed around their true positions, the problem of finding the K underlying true positions can be viewed as a Gaussian mixture problem where the variances of each observation are known. Thus the log-likelihood of the observations can be written as follows:

<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M17','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M17">View MathML</a>

where Θ[K] = (X[K], Π[K]) denotes the parameters of the model, <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M18','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M18">View MathML</a> are the mixing proportions, which sum to one, and φ[x] is the density function of a centered normalized Gaussian distribution. We assume without loss of generality that <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M19','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M19">View MathML</a> and that <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M20">View MathML</a> ≠ 0, for j ∈ [1,..., K]. In other word the distribution of the observed QTL locations is shaped by a mixture density where the components <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M21','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M21">View MathML</a> are the positions of the true QTL on the linkage group and the mixing proportions πj represent the proportion of QTL related to the jth true QTL which have been detected in the n mapping experiments.

Maximizing 1 can be achieved via a standard EM algorithm [34] by using the following parameter updates (M-step):

<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M22','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M22">View MathML</a>

where <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M23','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M23">View MathML</a> is the conditional probability that <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M12">View MathML</a>i belongs to the jth meta-QTL. This conditional probability <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M23','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M23">View MathML</a> is obtained by applying a simple Bayes' rule evaluated at the current parameter estimates (E-step):

<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M24','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M24">View MathML</a>

The EM-algorithm is run until reaching convergence: this yields the maximum-likelihood estimate denoted <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M25','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M25">View MathML</a>[K] = (<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M26','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M26">View MathML</a>[K], <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M27','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M27">View MathML</a>[K]). Finally, once <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M25','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M25">View MathML</a>[K] has been obtained the variance-covariance matrix of the parameter estimates, conditionally to the current model, can be computed by applying the Supplemental EM (SEM) strategy proposed by [35].

The problem is that we do not know K, i.e the number of true QTL positions. Since the mixture model of K components is nested into the model with K + 1 components, the likelihood ratio test (LRT) should be suitable. However, as discussed by many authors (see for instance [36,37]) the LRT statistic does not follow the usual χ2 distribution due to testing a null hypothesis on the boundary of the parameter space (i.e the regularity conditions on the loglikelihood do not hold). Another strategy is to use the Kullback-Leibler information in order to derive the information criterion which is widely used to select a statistical model. In particular, the Kullback-Leibler information can be viewed as a measure of goodness-of-fit of a statistical model. Here for a given value of K, minimizing the Kullback-Leibler information is equivalent to maximizing the negentropy KL,

<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M28','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M28">View MathML</a>

where g(<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M15">View MathML</a>, Σ) is the the true underlying density function. Thus, from the point of view of the negentropy maximization principle, the goodness of the model can be evaluated by the expected log-likelihood. Note that the negentropy maximization principle naturally leads to the maximization of the log-likelihood. However, the maximized log-likelihood is a naive estimate of the expected loglikelihood: since the same data set <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M15">View MathML</a> is used for both the estimation of the parameter and the estimation of the expected log-likelihood, L(<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M15">View MathML</a>, Σ; Θ[K]) is a biased estimator of the expected loglikelihood. Its bias is defined by,

<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M29">View MathML</a>

and the use of L(<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M15">View MathML</a>, Σ; <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M25','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M25">View MathML</a>[K]) - B is justified as an estimate of KL. There are different strategies to estimate this bias and several information based criteria have been reported in the mixture model literature in order to tackle the issue raised by choosing the number of components (see Additional File 2). In the next Simulation section, we propose to evaluate the ability of some of these information based criteria to determine the optimal number of QTL.

Additional File 2. Model choice by information based criterion. This PDF file deals with a short review of the theory underlying some well-known information based criteria to select the number of components in a mixture model.

Format: PDF Size: 72KB Download file

This file can be viewed with: Adobe Acrobat ReaderOpen Data

Simulation study

For the sake of concision, in this section we only present simulations for the QTL meta-analysis (simulations for the meta-analysis of genetic maps are described in Additional File 3). We assume that the complexity which shapes the distribution of the observed QTL along the chromosome can be represented by our mixture model. In order to explore mixture configurations which are realistic we have assumed that the QTL effects have a L-shaped distribution (i.e most of the detected QTL in mapping experiments have a small effect and only a few show a strong effect, or in other words, most of the detected QTL have large confidence intervals). Consequently this implies that Σ-1, the inverse of the QTL standard deviations, has also a L-shaped distribution (i.e the smaller the effect of the QTL the larger the confidence interval of the estimated QTL position). Then for a given value of the number of true QTL, K, we randomly generated configurations as follows:

Additional File 3. Simulation study for the meta-analysis of genetic maps. This PDF file describes two simulation scenarios which have been used to evaluate our weighted least squares strategy to build a consensus marker map.

Format: PDF Size: 68KB Download file

This file can be viewed with: Adobe Acrobat ReaderOpen Data

1. Draw Σ from a inverse gamma distribution (this simply mimics a L-shaped distribution).

2. Generate the mixing proportions by choosing them over the discrete uniform [0.1, 0.9] distribution subject to constraint <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M30','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M30">View MathML</a>.

3. Draw from a multinomial, with frequencies equal to the mixing proportions, the origins of the q observed QTL Z = (z1,...,zq) where zij = 1 if the ith observed QTL belongs to the jth true QTL, 0 otherwise.

4. Generate the true QTL positions, X = (x1,..., xk), subject to constraint xk + τmin <xk+1 <xk + τmax where τmin and τmax are defined so that the distance between xk and xk+1 lies between δmin and δmax. The distance δ is defined as the mahalanobis distance between xk and xk+1: <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M31','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M31">View MathML</a> where <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M32','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M32">View MathML</a> is the average standard deviation for the kth true QTL. This measures the separation between consecutive true QTL relatively to the precision of the experiments: δ ≤ 2 corresponds to tightly or moderately separated QTL, while δ ≥ 3 corresponds to well separated QTL.

We stress that this process is not an attempt to describe reality, nevertheless it makes it possible to cover a large range of possible repartitions of the QTL. Finally, for each of the 4 distance constraints considered (δmin = 1, 2, 3,4 and δmax = δmin + 1), 50 configurations were generated. For a given value of K, the following scenario was repeated 100 times:

1. Draw a sample <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M15">View MathML</a> of size q.

2. Run the EM-algorithm to obtain <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M25','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M25">View MathML</a>[K] for K = 1,..., q.

3. Choose the best model according to each criterion.

Since the goal of QTL meta-analysis is to obtain a better predictive inference of the true QTL locations we have compared the two alternative strategies:

• Strategy 1 : choose the model with as many true QTL as the number of observed QTL. It is the naive model, <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M33','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M33">View MathML</a>i(1) = <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M12">View MathML</a>i

• Strategy 2 : choose the best model K according to the model choice criterion, <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M34','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M34">View MathML</a>

For each strategy s = 1, 2, the measure of performance used was the mean squared error of prediction defined as follows:

<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M35','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M35">View MathML</a>

Absolute values of these MSEP are not of interest here because our goal is comparison of strategies; hence, we consider the ratios MSEP(2)/MSEP(1) for 5 different information based criteria:

• AIC = -2L(<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M15">View MathML</a>, Σ; <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M25','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M25">View MathML</a>[K]) + 2ν

• AICc = -2L(<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M15">View MathML</a>, Σ; <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M25','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M25">View MathML</a>[K]) + 2ν + <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M36','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M36">View MathML</a>

• AIC3 = -2L(<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M15">View MathML</a>, Σ; <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M25','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M25">View MathML</a>[K]) + 3ν

• BIC = -2L(<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M15">View MathML</a>, Σ; Θ) + ν log(q)

• EIC ≈ AIC - K + 1, which was obtained by means of simulations (data not shown).

where ν = 2K - 1 is the number of free parameters of the model and q the number of observed QTL along the chromosome (see Additional file 2 for theoretical details on each above criterion).

In Figure 1 we summarized the result of simulations for several values of K and q by averaging over the distance constraint configurations (δmin = 1, 2, 3 and 4). At first sight the 5 criteria seem to have the same behavior whatever the configuration, except for AIC3 which crucially underperforms for small values of q (this can be explained by the higher penality of AIC3 comparing to the other criteria for small values of q). For reasonable sample size relatively to the true number of components the meta-analysis appears to be more efficient than strategy 1. Since the AIC criterion has relatively good performance in these simulations we assume that there is no need for a specific theory to deal with this kind of mixture models and that this criterion can be used to carry out model selection in this context. So, in Figure 2 we focus on the AIC criterion for the different distance configurations δmin = 1, 2, 3 and 4. This clearly shows that, for configurations with reasonable separation between the true positions of the QTL, the meta-analysis performs relatively well. It is worth noting that the better the probability to choose the true model, the better the quality of the QTL position estimates. In order to evaluate the ability of the meta-analysis to improve the precision on the "true" QTL locations we computed the quantities |xi - <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M12">View MathML</a>i (s)| and calculated the quantiles at 95 and 90% of its empirical distribution over all the QTL for the two strategies. The smaller this confidence interval, the better the estimated position <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M12">View MathML</a>i(s). We reported in Supplementary Table 1 (see Additional File 4) the average ratios of these quantities between the two strategies. Hence, if there are actually one, two, three or four different QTL locations with a reasonable separation (δmin ≥ 2), we can see that the meta-analysis gives better estimates of the QTL locations and makes it possible to reduce the length of the 95% CI (in most of the situtations this length is halved). According to [29] to halve a CI in a QTL experiment, one needs to use at least two times the initial number of individuals.

thumbnailFigure 1. Comparison of model choice criteria: a simulation study. Simulation results for different values of the true number of QTL, K, and the number of observed QTL, q. The vertical bars indicate the probability that the best model selected by the criterion is the true model. The open circles, respectively the dotted lines, represent the mean, respectively the 0.1% and 0.9% quantiles, of the ratios MSEP(2)/MSEP(1) for each criterion.

thumbnailFigure 2. Performance of AIC: a simulation study. Behavior of the AIC criterion for the 4 distance constraints, δmin = 1, 2, 3, 4, and for different values of the true number of QTL, K, and the number of observed QTL, q. The vertical bars indicate the probability than the AIC criterion has selected the true model. The open circles, respectively the dotted lines, represent the mean, respectively the 0.1% and 0.9% quantiles, of the ratios MSEP(2)/MSEP(1).

Table 1. Model choice criteria for the meta-analysis of flowering time QTL on maize chromosome 8

Additional File 4. Supplementary Tables. This PDF file contains the two supplementary tables.

Format: PDF Size: 40KB Download file

This file can be viewed with: Adobe Acrobat ReaderOpen Data

Implementation

The previous methods have been implemented into a Java package called MetaQTL. This package includes also additional programs to format, organize or visualize the data. All the programs in MetaQTL are command line programs (see Supplementary Table 2 in Additional File 4 for a complete list of the programs available in the package). Each program performs a specific task and the programs can be combined by the user as a group to run a complete analysis. Thanks to its flexible and modular implementation, MetaQTL could also be integrated in more elaborated softwares if needed. First, before running meta-analysis one needs to store the different QTL studies into a database. To do this MetaQTL uses a simple multiple plain text files database. Each file corresponds to a table and the database is organized as follows:

Table 2. Parameter estimates of the best meta-analysis model for flowering time on maize chromosome 8

• Experiment file: stores descriptions on mapping experiments (name, population type and size, reference, ...).

• Genetic map directory : contains one file per input marker map. Each file contains the corresponding genetic marker map.

• QTL map directory : contains one file per QTL mapping experiments. For each QTL mapping experiment the file provides the properties of each detected QTL (trait, position, confidence interval, r-square, ...).

• Trait ontology file: describes how the traits are related together using a simple hierarchical relationship scheme. This information can then be used to group the QTL according to the ontology in subsequent analyses.

Once the database created, MetaQTL first checks the input data files and then summarizes their content into a set of XML files. All the programs of MetaQTL use these XML files as inputs. Utilities are provided to convert them in various plain text file formats if required (for more details on using MetaQTL see the user manual in Additional File 5).

Additional File 5. Documentation of MetaQTL. The documentation of the Java package MetaQTL in PDF format.

Format: PDF Size: 3.2MB Download file

This file can be viewed with: Adobe Acrobat ReaderOpen Data

Application

Recently, [12] made a bibliographical review of QTL studies relative to 4 traits related to flowering time in maize: days to pollen shed (DPS), silking date (SD), plant height (HT) and leaf number (LN). From the 22 QTL studies they reported, we excluded 6 experiments for which QTL detection was based on ANOVA with a low density of markers and 2 other for which it was not possible to get exact information on either the genetic linkage map or the QTL locations. In addition to these 15 mapping experiments we considered 3 other recent experiments (details of these 18 QTL studies are given in Additional File 6). We focus here on chromosome 8 and we present results by using for each step of the meta-analysis the corresponding program name of MetaQTL.

Additional File 6. Summary of the QTL mapping experiments used in the application. This PDF file gives details on the mapping experiments used in our application.

Format: PDF Size: 64KB Download file

This file can be viewed with: Adobe Acrobat ReaderOpen Data

Result of InfoMap

Among the 153 distinct markers which have been positioned over the 18 mapping experiments on the chromosome 8, only 53 markers are observed in at least two different mapping experiments. We restricted the meta-analysis to these 53 markers. Only one order inconsistency was detected between [38] and [39] concerning markers umc89a and umc12a. As in [38]umc12a is very close to umc89a (less than 2 cM) we have decided to ignore this marker in this mapping experiment. Over the 18 mapping experiments the mean interval distance was about 18.9 cM with an average of 8.7 markers per mapping experiment and it existed at least one common marker path which connected all the mapping experiments together (insuring that the WLS can be applied).

Result of ConsMap

The consensus linkage group of chromosome 8 is depicted in Figure 3. The goodness-of-fit of the consensus chromosome is relatively bad: λ = 365.31 with λ ~ <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/49/mathml/M37','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/49/mathml/M37">View MathML</a>. It could be due to some heterogeneities in recombination rate among mapping experiments, located in the filled marker intervals of Figure 3. Note that variability of recombination rate in maize was first reported by [40] and, more recently, [41] demonstrated that exotic inbred lines exhibit higher recombination rate that U.S. inbreds origin along chromosome 1 (see also [42]). On the other hand, since no information about the marker configurations in each individual mapping experiment was available, the variances of the distance estimates have been computed by assuming no missing data and no ambiguities (dominance) in the original data sets. This is surely too optimistic and some data sets may have included missing data and/or dominant markers. Therefore the precision on the distance estimate may have been overestimated for some marker intervals.

thumbnailFigure 3. Overview of the maize chromosomes 8 together with the consensus chromosome. Overview of chromosome 8 for the 18 mapping experiments involved in the meta-analysis of flowering time in maize. The first chromosome at the left represents the consensus chromosome obtained by applying the WLS approach as described in the first section of the article (implemented into ConsMap). The filled marker intervals indicate that the standardized residual between the interval distance estimates of the original chromosome and the consensus one exceeded the double-sided 95% percentile of a normalized centered gaussian. This figure has been created by the program MMapView.

Result of QTLProj

From the 18 QTL studies we projected 34 QTL on the consensus chromosome 8. Among these 34 QTL, 16 (47%) are related to SD, 10 (29%) to DPS and 8 (24%) to HT. The distribution of the r-square values clearly shows a L-shape: 75% of the QTL have r-square values lower than 12%. For 17 QTL a CI was reported (build from a 1-LOD support) from which we computed the standard deviations assuming that a 1-LOD support corresponds in fact to a 90% CI. For the other QTL we derived the standard deviations from the formula proposed by [29]. Then models from K = 1 to K = 10 QTL were considered and their parameters estimated by applying our EM-algorithm.

Result of QTLClust

In Table 1 we give the values for the criteria AIC, AICc, AIC3 and BIC for the different values of K explored. This clearly shows that the model with 5 QTL is the best one. Then, for the model with 5 QTL, the parameter estimates are listed in Table 2 and depicted in Figure 4. First, 3 QTL (1,4 and 5) have been detected in only 22% of the mapping experiments. At least two observed QTL are assigned to each of these 3 QTL without ambiguity.

thumbnailFigure 4. Visualization of the QTL meta-analysis result on maize chromosome 8. Result of the meta-analysis of the 34 QTL projected on the consensus chromosome 8. The CI of the meta-QTL positions are indicated on the chromosome by the filled colored areas. The observed QTL positions are depicted by their most probable position (triangle) and CI (segment). Membership probabilities of each initial QTL with respect to meta-QTL is visualized by the proportions of corresponding colored segments. This figure has been created by using MQTLView.

Secondly, two closely linked QTL (2 and 3) contribute to 75% of the reported QTL. This is strongly consistent with the knowledge of this region where a major QTL, vgt1, is tightly linked to another QTL, vgt2 [43,44]. It is worth noting that the confidence interval of the QTL corresponding to vgt1 (around 3.8 cM) encompasses a marker interval of approximately 2 cM (at the left of the marker umc89a) in which this QTL has been finely mapped by [44] using NIL lines (result not included in our analysis). This congruency lends further credence to the meta-analysis approach.

Discussion and conclusion

Nowadays more and more studies concerning QTL detection are available via public databases and the number of articles dealing with the comparison and/or integration of these results increases [12,45-47]. We believe that our meta-analysis procedure can contribute to facilitate the elaboration of such syntheses by providing a simple statistical framework to establish consensus models for both linkage maps and QTL locations.

First, the WLS strategy we proposed is a step forward to integrate several genetic marker maps. Contrary to iterative projection procedures, this approach provides a well-established statistical machinery (WLS) to assess the goodness-of-fit of the consensus model. It can also be used to test the homogeneity of the distance estimates among different mapping experiments. This can be usefull to investigate the possible variation of recombination rate among genotypes (as reported by [41]). As pointed out in the application, this method can suffer from the lack of knowledge about the effective precision on the marker interval distances in each individual mapping experiment due to possible missing data and/or the type of scoring of individual markers (codominance vs dominance). This could be improved by asking researchers to supply the variance estimates of the marker interval distances when they submit their results to a public database. These variance estimates could be used to improve the weight factors in the WLS model. Also, as sometimes robust framework maps are available in the literature or via public databases, the program ConsMap in MetaQTL offers the possibility to fix a genetic map as a reference (i.e for which the distances between ordered markers are assumed to be the "actual" distances). In this case, only the positions of the markers which are not reported on the reference have to be estimated.

Secondly, for the QTL meta-analysis itself, the Gaussian mixture model used to fit the distribution of the observed QTL locations on the chromosome provides a well-studied statistical inference technique. In this model-based clustering, each "true" QTL is mathematically represented by the Gaussian distribution of its detected positions, which leads to a probabilistic classification of the observed QTL. Contrary to [17] who developed a specific model choice criterion, our simulation results show that AIC gives relatively good performances in our QTL meta-analysis framework. This difference, with regard to the conclusions of [17], may be explained by their discrete formulation of the problem (recall that, instead of using a usual Gaussian mixture likelihood to evaluate the probability of the data, they assumed that the observations could be categorically assigned to the mixture components). Parameter estimates obtained by this approach were not really the maximum-likelihood estimates of the underlying mixture model. This may have added a bias in the evaluation of the AIC criterion, which could explain the bad performances of AIC in their simulations.

Thus, our mixture-modelling approach makes it possible to go beyond the limits encountered by [17]: the Akaike like criterion they proposed was limited to models from 1 to 4 QTL. As a consequence, [47] who used the method of [17], was obliged to break chromosomes on distinct segments to carry out the meta-analysis. This subjective division of the chromosome can now be avoided thanks to our method. Simulations have shown that the ratio between the number of observed QTL and the number of "true" QTL is one of the main limiting factor. The number of "true" QTL which can be assessed by the meta-analysis must be reasonable compared to the number of observed QTL (at least between 5 or 10 observed QTL per actual location). Note that this also depends on the distance between true QTL. But since there are more and more QTL locations reported for a given trait and since the real number of distinct QTL locations which can be detected with usual experimental designs is limited (only QTL with relatively large effects can be found), we assume that in many cases the ratio between observed and "true" QTL locations will steadily increase and should generally be reasonable. It is worth noting that, provided that the number of observed QTL is appropriate, the meta-analysis is able to separate "true" QTL locations even if they are closely linked (as illustrated in the application with vgt1 and vgt2, and the consistency of the vgt1 estimated position with fine mapping result of [44], result not included in the meta-analysis).

The ultimate step toward a more accurate identification of QTL relies on finding the underlying genes. Up to now, the majority of QTL isolated in plants have been cloned via positional cloning (see for instance [48]). However positional cloning of QTL is quite expensive both in terms of time and resources due to the necessity to screen recombinant individuals within large population (typically several hundreds) and to characterize these individuals with a very dense set of molecular markers. As an alternative and thanks to the advent of structural and functional genomics, QTL can also be resolved through association mapping of candidate genes. Candidate genes identification is based on a assumption that the polymorphism of the gene is associated with the variation of the trait of interest. Both function and mapping information have to be crossed to establish this assumption. The function of the gene may have been determined in the species of interest, based for instance on mutant analysis. More often, function is hypothesized based on sequence homology with genes the function of which has been established in model species, including possible positional cloning of QTL. Gene mapping information may have been obtained in the species of interest, but may have been also inferred from synteny based projections, as illustrated by [12] for rice to maize. Relevancy of the colocalization between QTL and candidate genes crucially depends on the confidence interval of the QTL positions. For this purpose the reduction of the confidence interval of the QTL is an important goal [2]. The ability of our method to reduce the QTL confidence interval by taking advantage of pooling QTL results could contribute in an increased resolution in selecting candidate genes. It is worth noting that candidate genes are generally mapped on a framework map used as reference for the species of interest (e.g. in maize [49]), while the QTL detections are carried out using specific populations (generally obtained by crossing parents contrasted for the trait(s) of interest).

Therefore, the selection of candidate genes which colocalize with QTL depends also on the process used to merge these different maps. Up to now, no statistical method had been proposed to combine candidate genes and QTL mapped in independent experiments, so we think that our WLS strategy should increase the precision of the integration of candidate gene mapping information.

Finally once candidate genes have been selected and their different haplotypes defined, association studies can be carried out. The identification of a statistically significant association between haplotype variation at a candidate gene and the target trait gives further credence on the role of this gene in the trait variation. Since the last 5 years more and more association studies have been reported in plants [50]. It would be interesting to integrate these new results into a global meta-analysis framework. Further developments are needed to combine onto a synthetic model the different scale of mapping: from linkage mapping (QTL) to fine mapping (association studies).

Availability and requirements

• Project name: MetaQTL

• Project home page: http://bioinformatics.org/mqtl webcite

• Operating system: Platform independent

• Programming language: Java

• Other requirements: Java 5.0

• License: GNU GPL

Additional File 7. MetaQTL Package : jar file and tutorial. This Zip archive contains both the MetaQTL JAR file and the files of the tutorial.

Format: ZIP Size: 4.9MB Download file or display content in a new windowOpen Data

Authors' contributions

B.G and A.C conceived the initial idea. J-B.V. developed and designed the project: he coded the package MetaQTL, developed the simulation tests, applied MetaQTL on the flowering time maize data set. All authors contributed to the interpretation of the results. Then J-B.V. and A.C. drafted the manuscript. All authors read and approved the final version of the manuscript.

Acknowledgements

This work was supported by INRA grant and the authors would like to thank F. Chardon, from INRA, for his help on the gathering process of QTL results relative to flowering time in maize. We would like also to thank C. Lepoittevin for her help in testing and debugging MetaQTL. We also thank Graham Coop for his help to improve the style of the manuscript.

References

  1. Kearsey M, Pooni HS: The genetical analysis of quantitative traits. London: Chapman and Hall; 1996. OpenURL

  2. Kearsey M, Farquhar A: QTL analysis in plants; where are we now?

    Heredity 1998, 80:137-142. PubMed Abstract | Publisher Full Text OpenURL

  3. Paran I, Zamir D: Quantitative traits in plants: beyond the QTL.

    Trends Genet 2003, 19(6):303-306. PubMed Abstract | Publisher Full Text OpenURL

  4. Xu S: Theoretical basis of the Beavis effect.

    Genetics 2003, 165(4):2259-68. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  5. Beavis W: The power and deceit of QTL experiments: lessons from comparative QTL studies. In Proceedings of the Forty-Ninth Annual Corn and Sorghum Industry Research Conference. Washington, DC: American Seed Trade Association; 1994:250-266. OpenURL

  6. Paterson AH, Lander ES, Hewitt JD, Peterson S, Lincoln SE, Tanksley SD: Resolution of quantitative traits into Mendelian factors by using a complete linkage map of restriction fragment length polymorphisms.

    Nature 1988, 335:521-529. Publisher Full Text OpenURL

  7. Lin YR, Schertz KF, Paterson AH: Comparative analysis of QTLS affecting plant height and maturity across the poaceae, in reference to an interspecific sorghum population.

    Genetics 1995, 141:391-411. PubMed Abstract | PubMed Central Full Text OpenURL

  8. Paterson AH, Lin YR, Schertz KF, Doebley JF, Pinson SRM, Liu SC, Stansel JW, Irvine JE: Convergent domestication of cereal crops by independent mutations at corresponding genetic loci.

    Science 1995, 269:1714-1718. Publisher Full Text OpenURL

  9. Keightley P, Knott S: Testing the correspondence between map positions of quantitative trait loci.

    Genet Res Camb 1999, 74:323-328. OpenURL

  10. Mihaljevic R, Utz HF, Melchinger AE: Congruency of quantitative trait loci detected for agronomic traits in testcrosses of five populations of european maize.

    Crop Sci 2004, 44:114-124. OpenURL

  11. Khatkar M, Thomson P, Tammen I, Raadsma H: Quantitative trait loci mapping in dairy cattle: review and meta-analysis.

    Genet Sel Evol 2004, 36:163-190. PubMed Abstract | Publisher Full Text OpenURL

  12. Chardon F, Virlon B, Moreau L, Falque M, Joets J, Decousset L, Murigneux A, Charcosset A: Genetic architecture of flowering time in maize as inferred from quantitative trait loci meta-analysis and synteny conservation with the rice genome.

    Genetics 2004, 168:2169-2185. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  13. Arcade A, Labourdette A, Falque M, Mangin B, Chardon F, Charcosset A, Joets J: BioMercator: integrating genetic maps and QTL towards discovery of candidate genes.

    Bioinformatics 2004, 20(14):2324-2326. PubMed Abstract | Publisher Full Text OpenURL

  14. Yap I, Schneider D, Kleinberg J, Matthews D, Cartinhour S, McCouch S: A graph-theoretic approach to comparing and integrating genetic, physical and sequence-based maps.

    Genetics 2003, 165:2235-2247. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  15. Stam P: Construction of integrated genetic linkage maps by means of a new coputer package: JoinMap.

    Plant J 1993, 3:739-744. OpenURL

  16. Schiex T: Carthagene: constructing and joining maximum likelihood maps.

    ISMB 1997, 5:258-267. PubMed Abstract OpenURL

  17. Goffinet B, Gerber S: Quantitative trait loci: a meta-analysis.

    Genetics 2000, 155:463-473. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  18. Allison DB, Heo M: Meta-analysis of linkage data under worst-case conditions: a demonstration using the human OB region.

    Genetics 1998, 148:859-865. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  19. Van Zandt P, Mopper S: A meta-analysis of adaptive deme formation in phytophagous insect populations.

    Am Nat 1998, 152:595-604. Publisher Full Text OpenURL

  20. Vollestad LA, Hindar K, Moller AP: A meta-analysis of fluctuating asymmetry in relation to heterozygosity.

    Heredity 1999, 83:206-218. PubMed Abstract | Publisher Full Text OpenURL

  21. Lohmueller KE, Pearce CL, Pike M, Lander ES, Hirschhorn JN: Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease.

    Nat Genet 2003, 33:177-182. PubMed Abstract | Publisher Full Text OpenURL

  22. Etzel C, Guerra R: Meta-analysis of genetic-linkage of quantitive trait loci.

    Am J Hum Genet 2003, 71:56-65. Publisher Full Text OpenURL

  23. Morton NE: The detection and estimation of linkage between the genes for elliptocytosis and the Rh blood type.

    Am J Hum Genet 1956, 8(2):80-96. PubMed Abstract | PubMed Central Full Text OpenURL

  24. Conneally P, Edwards J, Kidd K, Lalouel J, Morton N: Reports of the committee methods of linkage analysis and reporting.

    Cytogent Cell Genet 1985, 40:356-359. OpenURL

  25. Lander ES, Botstein D: Mapping Mendelien factors underlying quantitative traits using RFLP linkage maps.

    Genetics 1989, 121:185-199. PubMed Abstract | PubMed Central Full Text OpenURL

  26. Dupuis J, Siegmund D: Statistical methods for mapping quantitative trait loci from a dense set of markers.

    Genetics 1999, 151:373-386. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  27. Darvasi A, Weinreb A, Minke V, Weller J, Soller M: Detecting marker-QTL linkage and estimating QTL gene effect and map location using a saturated map.

    Genetics 1993, 134:943-951. PubMed Abstract | PubMed Central Full Text OpenURL

  28. Kao C, Zeng Z, Teasdale R: Multiple Interval Mapping for Quantitative Trait Loci.

    Genetics 1999, 152:1203-1216. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  29. Darvasi A, Soller M: A simple method to calculate resolving power and confidence interval of QTL map location.

    Behav Genet 1997, 27(2):125-132. PubMed Abstract | Publisher Full Text OpenURL

  30. Visscher PM, Goddard ME: Prediction of the confidence interval of quantitative trait loci.

    Behavior Genetics 2004, 34(4):477-482. PubMed Abstract | Publisher Full Text OpenURL

  31. Mangin B, Goffinet B, Rebaï A: Constructing confidence intervals for QTL location.

    Genetics 1994, 138:1301-1308. PubMed Abstract | PubMed Central Full Text OpenURL

  32. Jansen RC: Interval mapping of multiple quantitative trait loci.

    Genetics 1993, 135:205-211. PubMed Abstract | PubMed Central Full Text OpenURL

  33. Zeng ZB: Precision mapping of quantitative trait loci.

    Genetics 1994, 136(4):1457-1468. PubMed Abstract | PubMed Central Full Text OpenURL

  34. Dempster A, Laird N, Rubin D: Maximum Likelihood from incomplete data via the EM algorithm (with discussion).

    J Roy Stat Soc 1977, B 39:1-38. OpenURL

  35. Meng X, Rubin DB: Using EM to obtain asymptotic variance-covariance matrices: the SEM algorithm.

    J Am Stat Assoc 1991, 86:899-909. Publisher Full Text OpenURL

  36. Aitkin M, Rubin D: Estimation and Hypothesis Testing in Finite Mixture Models.

    Journal of the Royal Statistical Society 1985, 47:67-75. OpenURL

  37. Titterington D, Smith A, Markov U: Statistical Analysis of Finite Mixture Distributions. New York: John Wiley and Sons; 1985. OpenURL

  38. Poupard B, Moreau L, Charcosset A: Analyse de 1'épistatsie entre QTL pour 3 caractères agronomiques chez le maïs.

    Tech rep, INRA 2001. OpenURL

  39. Mechin V, Argillier O, Hebert Y, Guingo E, Moreau L, Charcosset A, Barriere Y: Genetic analysis and QTL mapping of cell wall degistibility and lignification in silage maize.

    Crop Sci 2001, 41:690-697. OpenURL

  40. Stadler LJ: The Variability of Crossing Over in Maize.

    Genetics 1925, 11:1-37. OpenURL

  41. Williams CG, Goodman MM, Stuber CW: Comparative recombination distances among Zea mays L. inbreds, wide crosses and interspecific hybrids.

    Genetics 1995, 141(4):1573-1581. PubMed Abstract | PubMed Central Full Text OpenURL

  42. Ji Y, Stelly DM, De Donato M, Goodman MM, Williams CG: A candidate recombination modifier gene for Zea mays L.

    Genetics 1999, 151(2):821-830. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  43. Vladutu C, McLaughlin J, Phillips RL: Fine mapping and characterization of linked quantitative trait loci involved in the transition of the maize apical meristem from vegetative to generative structures.

    Genetics 1999, 153(2):993-1007. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  44. Salvi S, Tuberosa R, Chiapparino E, Maccaferri M, Veillet S, van Beuningen L, Isaac P, Edwards K, Phillips RL: Toward positional cloning of Vgt1, a QTL controlling the transition from the vegetative to the reproductive phase in maize.

    Plant Mol Biol 2002, 48(5–6):601-13. PubMed Abstract | Publisher Full Text OpenURL

  45. Khavkin E, Coe EH: Mapped genomic locations for developmental functions and QTLs reflect concerned groups in maize (Zea mays L.).

    Theor Appl Genet 1997, 95:343-352. Publisher Full Text OpenURL

  46. Khavkin E, Coe EH: The major quantitative trait loci for plant stature, development and yield are general manifestations of developmental gene clusters.

    Maize Newslett 1998, 72:60-66. OpenURL

  47. Chardon F, Hourcade D, Combes V, Charcosset A: Mapping of a spontaneous mutation for early flowering time in maize highlights contrasting allelic series at two-linked QTL on chromosome 8.

    Theor Appl Genet 2005, 112:1-11. PubMed Abstract | Publisher Full Text OpenURL

  48. Salvi S, Tuberosa R: To clone or not to clone plant QTLs: present and future challenges.

    Trends Plant Sci 2005, 10(6):297-304. PubMed Abstract | Publisher Full Text OpenURL

  49. Falque M, Decousset L, Dervins D, Jacob AM, Joets J, Martinant JP, Raffoux X, Ribiere N, Ridel C, Samson D, Charcosset A, Murigneux A: Linkage mapping of 1454 new maize candidate gene Loci.

    Genetics 2005, 170(4):1957-1966. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  50. Gupta PK, Rustgi S, Kulwal PL: Linkage disequilibrium and association studies in higher plants: present status and future prospects.

    Plant Mol Biol 2005, 57(4):461-485. PubMed Abstract | Publisher Full Text OpenURL