Email updates

Keep up to date with the latest news and content from BMC Biology and BioMed Central.

Open Access Correspondence

Polyploidization increases meiotic recombination frequency in Arabidopsis: a close look at statistical modeling and data analysis

Lin Wang1 and Zewei Luo12*

Author Affiliations

1 Laboratory of Population & Quantitative Genetics, Institute of Biostatistics, Fudan University, Shanghai 200433, China

2 School of Biosciences, University of Birmingham, Birmingham B15 2TT, UK

For all author emails, please log on.

BMC Biology 2012, 10:30  doi:10.1186/1741-7007-10-30


The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1741-7007/10/30


Received:25 November 2011
Accepted:18 April 2012
Published:18 April 2012

© 2012 Wang and Luo; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

This paper is a response to Pecinka A, Fang W, Rehmsmeier M, Levy AA, Mittelsten Scheid, O: Polyploidization increases meiotic recombination frequency in Arabidopsis. BMC Biology 2011, 9:24.

See research article at http://www.biomedcentral.com/1741-7007/9/24 webcite

Background

Many species, particularly flowering plants, have usually experienced a state of polyploidy in their evolutionary history. Meiotic recombination creates novel configurations of genetic variants maintained in the genome of a species for facilitating natural and artificial selection. Thus, to understand how a diploid species differs in frequency of meiotic recombination from its polyploid ancestor has a significant impact in both evolutionary biology and plant and animal genetic breeding. It has been well established that the evolution of polyploid genomes is an extremely dynamic process compared to that of diploids, characterized by extensive genetic and epigenetic changes occurring in the nuclear genome following polyploidization [1-3]. Little is known about the mechanism underpinning the genetic changes. To address this fundamental question, Pecinka and colleagues described in a recently published paper by BMC Biology a direct comparison in frequency of meiotic recombination of the diploid and tetraploid genomes of Arabidopsis [4]. One of the most striking methodological challenges to the study is to properly evaluate the recombination parameter in populations of the species at different levels of polyploidy, particularly in autopolyploids. In fact, linkage analysis with autotetraploids has been a historical problem that can be traced back to the pioneering works of the prominent mathematical geneticists [5-7].

There are at least two major challenges to the tetrasomic linkage analysis. Firstly, double reduction, the most distinct characteristic of polysomic inheritance, allows sister chromatids to enter into the same gamete during meiosis and thus cause systematic distortion in allele segregation [8]. Secondly, multiplex allele segregation makes it almost impossible to infer the underlying genotype directly from PRC-based phenotype data even for co-dominant markers such as a single nucleotide polymorphism or simple sequence repeats [9]. To avoid these difficulties, these authors firstly developed a seed-based assay by creating transformants with green and red fluorescent markers expressed under a seed-specific promoter in Arabidopsis thaliana [10,11]. In parallel, they created diploids, allotetraploids and autotetraploids which carry these fluorescent markers. After carrying out a series of backcross and selection breeding, they were able to create the diploid and tetraploid lines which bear only a single copy of the marker alleles linked on the Arabidopsis chromosome III. These lines were used to create the segregation populations from which marker phenotype data were collected and used for estimation of recombination frequency between the markers [4].

The method these authors implemented for modeling and analyzing the marker data from diploid and tetraploid (allo- and autotetraploid) populations needs to be formulated on the basis of the disomic and tetrasomic inheritance models. This paper presents statistically appropriate and mathematically rigorous methods for modeling and re-analyzing the datasets.

Notation, model and analysis

We consider segregation of alleles at the two fluorescent marker loci on the Arabidopsis chromosome III in a F2 family from crossing two parental lines at the marker loci. Following notations of Pecinka et al. [4], parental genotypes at the markers can be denoted by GR/GR and BC/BC for diploid, GR/BC/DE/DE and BC/BC/DE/DE for allotetraploid, and GR/BC/BC/BC and BC/BC/BC/BC for autotetraploid. The parental lines were crossed to generate offspring populations of diploids, allotetraploids and autotetraploids accordingly. Regardless of polyploidy, the offspring populations from mating these parents can be grouped into four phenotypes: yellow (carrying both red and green marker alleles), green (green allele only), red (red allele only) and grey (none of the marker alleles). The number of individuals for each of the four phenotype classes is denoted by n1,n2,n3 and n4 respectively. Let r be recombination frequency between the two markers and α be the coefficient of double reduction at the green marker, which is nearer to the centromere than the red marker locus. The probability of observing each of the four phenotypes in the diploids and allotetraploids depends on only one parameter, r (fi(r), i = 1,...,4), but characterization of the phenotypic distribution in the autotetraploids needs the two parameters, r and α (fi(α,r), i = 1,...,4).

The logarithm of the model parameter(s) given the observations n1,n2,n3 and n4 can be written as:

<a onClick="popup('http://www.biomedcentral.com/1741-7007/10/30/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1741-7007/10/30/mathml/M1">View MathML</a>

for the diploid and allotetraploid populations or:

<a onClick="popup('http://www.biomedcentral.com/1741-7007/10/30/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1741-7007/10/30/mathml/M2">View MathML</a>

for the autotetraploid population where fi(r) (i = 1,...,4) can be worked out following the principle of two-locus disomic linkage analysis and listed in Table 1. It needs to be pointed out that the phenotypic distribution is common between the diploid and allotetraploid populations. This is because homoeologous pairing was completely excluded in meiosis of the synthesized allotetraploids of Arabidopsis [10,11], thus the allotetraploids show strict disomic inheritance. However, calculation of phenotypic distribution in the autotetraploid segregation population must follow the principle of tetrasomic linkage analysis as is detailed in [11]. In the present context, the phenotypic distribution can be worked out as:

Table 1. Distribution of seed phenotype and the underlying genotype at the two fluorescence markers in F2 diploid and autotetraploid populations and estimates of the model parameters

<a onClick="popup('http://www.biomedcentral.com/1741-7007/10/30/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1741-7007/10/30/mathml/M3">View MathML</a>

<a onClick="popup('http://www.biomedcentral.com/1741-7007/10/30/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1741-7007/10/30/mathml/M4">View MathML</a>

<a onClick="popup('http://www.biomedcentral.com/1741-7007/10/30/mathml/M5','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1741-7007/10/30/mathml/M5">View MathML</a>

<a onClick="popup('http://www.biomedcentral.com/1741-7007/10/30/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1741-7007/10/30/mathml/M6">View MathML</a>

where gi(α,r) (i = 1, 2, 3, 4) are as given below:

<a onClick="popup('http://www.biomedcentral.com/1741-7007/10/30/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1741-7007/10/30/mathml/M7">View MathML</a>

<a onClick="popup('http://www.biomedcentral.com/1741-7007/10/30/mathml/M8','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1741-7007/10/30/mathml/M8">View MathML</a>

The maximum likelihood estimate (MLE) of the recombination frequency in diploids and allotetraploids can be calculated from solving the normal equations:

<a onClick="popup('http://www.biomedcentral.com/1741-7007/10/30/mathml/M9','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1741-7007/10/30/mathml/M9">View MathML</a>

which is a quadratic equation of r. In the diploid population where n1= 2805, n2= 322, n3= 333 and n4= 791, the quadratic equation has only one real root filling in [0.0, 0.5], which is the MLE <a onClick="popup('http://www.biomedcentral.com/1741-7007/10/30/mathml/M10','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1741-7007/10/30/mathml/M10">View MathML</a>. Based on the likelihood function, one can calculate the standard deviation of the estimate from the Fisher's information measure for MLE. In the present context, it equals:

<a onClick="popup('http://www.biomedcentral.com/1741-7007/10/30/mathml/M11','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1741-7007/10/30/mathml/M11">View MathML</a>

In their original report [4], Pecinka et al. estimated the recombination frequency by equating the probability of the recombinant individuals, in other words, those displaying green and red seeds, to the observed proportion of these individuals. In the present notations, the probability has a form of 2r-r2 = 2(n2+n3)/n where n = n1+n2+n3+n4. They provided an estimate of <a onClick="popup('http://www.biomedcentral.com/1741-7007/10/30/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1741-7007/10/30/mathml/M12">View MathML</a> with a standard deviation of 0.009. It is not clear how the standard deviation was calculated. The method for calculating the recombination frequency may not be statistically appropriate in two aspects. Firstly, the calculation did not use the full information of the data. For example, the individuals with yellow seeds were not taken into consideration when counting for recombination events. In fact, there is a proportion of [r(1-r)+r2/2]/[3(1-r)2/4+r(1-r)+r2/2] among this group of individuals which carry recombinant gametes. Secondly, the MLE of r obtained from the present analysis is four times as likely as that provided by the original report, that is:

<a onClick="popup('http://www.biomedcentral.com/1741-7007/10/30/mathml/M13','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1741-7007/10/30/mathml/M13">View MathML</a>

In the allotetraploid population where n1= 1484, n2= 275, n3= 298 and n4= 320, the MLE and corresponding standard deviation are calculated as 0.2770 ± 0.0110. The estimate is in contrast to 0.241 with standard deviation = 0.018 in the original report.

To analyze the likelihood model for the autotetraploid marker data, we firstly noticed that the parameter α involves information of allele segregation at the green marker only. By setting r = 0 in the phenotypic probabilities given above, we can work out:

<a onClick="popup('http://www.biomedcentral.com/1741-7007/10/30/mathml/M14','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1741-7007/10/30/mathml/M14">View MathML</a>

<a onClick="popup('http://www.biomedcentral.com/1741-7007/10/30/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1741-7007/10/30/mathml/M15">View MathML</a>

which represent the probability of observing an individual carrying or not carrying the green fluorescent marker respectively. Let ng = n1+n2 be the number of individuals carrying the green marker allele and n0 = n3+n4 be the number of individuals not carrying the allele. The log-likelihood function has the form:

<a onClick="popup('http://www.biomedcentral.com/1741-7007/10/30/mathml/M16','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1741-7007/10/30/mathml/M16">View MathML</a>

Solving the following equation:

<a onClick="popup('http://www.biomedcentral.com/1741-7007/10/30/mathml/M17','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1741-7007/10/30/mathml/M17">View MathML</a>

results in the MLE of <a onClick="popup('http://www.biomedcentral.com/1741-7007/10/30/mathml/M18','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1741-7007/10/30/mathml/M18">View MathML</a> with a standard deviation of 0.0121 (Table 1).

Incorporating <a onClick="popup('http://www.biomedcentral.com/1741-7007/10/30/mathml/M18','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1741-7007/10/30/mathml/M18">View MathML</a> into the likelihood function, we found that the equation:

<a onClick="popup('http://www.biomedcentral.com/1741-7007/10/30/mathml/M19','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1741-7007/10/30/mathml/M19">View MathML</a>

has only one real root, <a onClick="popup('http://www.biomedcentral.com/1741-7007/10/30/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1741-7007/10/30/mathml/M20">View MathML</a>, which is the MLE of the recombination frequency under the tetrasomic model. The estimate has a standard deviation of 0.0051, which was calculated from the second derivative of the likelihood function at the MLE. Based on the estimates of α and r, we can predict the coefficient of double reduction at the red marker from: <a onClick="popup('http://www.biomedcentral.com/1741-7007/10/30/mathml/M21','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1741-7007/10/30/mathml/M21">View MathML</a>

which is below the upper bound of the maximum value of double reduction ¼ [12]. The estimates of α and β strongly suggest multivalent pairing among the homologous chromosomes and, in turn, double reduction occurring in the genome region flanked by the fluorescent markers (Table 1).

Discussion

This article presents likelihood-based methods for estimating the recombination frequency and other relevant parameters from segregating populations of diploids, allotetraploids and autotetraploids. We demonstrate the methods by re-analyzing the datasets published in Pecinka et al. [4]. Re-analysis of the datasets with the methods developed here reveals quantitative and qualitative differences from the original analysis. Firstly, unlike Pecinka et al., the present analysis enables estimation of double reduction at the marker loci under study and discovers significant double reduction at the markers, emphasizing the necessity of taking the tetrasomic nature into consideration in the data analysis. Secondly, the present study differs from the original analysis in the inferred order in frequency of meiotic recombination of allotetraploid and autotetraploid Arabidopsis. Pecinka et al. concluded that meiotic recombination was more frequent in the allotetraploids than in the autotetraploids whilst our results predict differently. Rationale for the present prediction is supported by several aspects. In fact, allotetraploids show strictly disomic inheritance particularly in the present instance where homoeologous chromosome pairing was excluded. In contrast, homologous chromosomes in autotetraploids have a substantially higher chance to pair, which is essential for recombination to occur among the chromosomes [13]. In addition, it has been theoretically demonstrated that autotetraploids have a much higher upper bound value of recombination frequency in autotetraploids (¾) than that in diploids and allotetraploids (1/2) [14]. Thirdly, the estimates obtained from the present analysis are significantly highly supported by the observed experimental data when compared with those from the original analysis in term of likelihood values of these estimates.

Computer program

Formulation and numerical analysis represented above were programmed in Mathematica [15] and the programs will be made available from the corresponding author after publication of this paper.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

ZWL conceived the research and formulated the statistical analysis. LW and ZWL analyzed the data and wrote the paper. Both authors read and approved the final manuscript.

Acknowledgements

This study was supported by research grants from the Leverhulme Trust (UK) and The National Basic Research Program of China (2012CB316505). LW is supported by the postdoctoral funding of China and ZWL was also funded by the National Natural Science Foundation of China (31071084/C060103).

References

  1. Soltis DE, Soltis PS: The dynamic nature of polyploid genome.

    Proc Nat Acad Sci USA 1995, 92:8089-8091. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  2. Song K, Lu P, Tang K, Osborn TC: Rapid genome change in synthetic polyploids of Brassica and its implications for polyploid evolution.

    Proc Natl Acad Sci USA 1995, 92:7719-7723. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  3. Osborn T, Pires J, Birchler J, Auger D, Chen Z: Understanding mechanisms of novel gene expression in polyploids.

    Trends Genet 2003, 19:141-147. PubMed Abstract | Publisher Full Text OpenURL

  4. Pecinka A, Fang W, Rehmsmeier M, Levy AA, Mittelsten Scheid O: Polyploidization increases meiotic recombination frequency in Arabidopsis.

    BMC Biology 2011, 9:24. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  5. Haldane JBS: Theoretical genetics of autotetraploids.

    J Genet 1930, 22:359-372. Publisher Full Text OpenURL

  6. Mather K: Segregation and linkage in autotetraploids.

    J Genet 1936, 30:287-314. OpenURL

  7. Fisher RA: The theory of linkage in polysomic inheritance.

    Philos Trans R Soc Lond B Biol Sci 1947, 23:55-87. OpenURL

  8. Barley NTJ: Introduction to Mathematical Theory of Genetic Linkage. London: Oxford University Press; 1961.

  9. Luo ZW, Hackett CA, Bradshaw JE, McNicol JW, Milbourne D: Predicting parental genotypes and gene segregation for tetrasomic inheritance.

    Theor Appl Genet 2000, 100:1067-1073. Publisher Full Text OpenURL

  10. Melamed-Bessudo C, Yehuda E, Stuitje AR, Levy AA: A new seed-based assay for meiotic recombination in Arabidopsis thaliana.

    Plant J 2005, 43:358-484. OpenURL

  11. Luo ZW, Zhang R, Kearsey MJ: Theoretical basis for genetic linkage analysis in autotetraploid species.

    Proc Natl Acad Sci USA 2004, 101:7040-7045. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  12. Cormai L: The advantages and disadvantages of being polyploidy.

    Nat Rev Genet 2005, 6:836-646. PubMed Abstract | Publisher Full Text OpenURL

  13. Luo ZW, Zhang Z, Leach L, Zhang RM, Bradshaw JE, Kearsey MJ: Constructing genetic linkage maps under a tetrasomic model.

    Genetics 2006, 172:2635-2635. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  14. Wolfram S: Mathematica: A System for doing Mathematics by Computer. 5th edition. Addison-Wesley Publishing Company, Inc. USA; 2000. OpenURL