Skip to main content

A selective genotyping approach identifies QTL in a simulated population

Abstract

Background

Identification of QTLs for important phenotypic traits, through the use of medium-density genome-wide SNP panels, is one of the most challenging areas in animal genetics, for preventing the time-consuming direct sequencing of putative candidate genes, when searching for the mutations that affect the trait. Appropriate statistical analyses allow the identification of genomic regions associated with the investigated trait in the genotyped population.

Methods

The selective genotyping technique was applied to 1000 genotyped animals with known phenotype. Sliding windows composed of five consecutive SNPs were created for each chromosome; we assumed that the QTLs were encoded by the windows showing the highest difference in the frequency of the same alleles between the most divergent productive groups (the two tails of the distribution).

Results

Ten windows affected at least one trait. For five of these windows, the highest and significant effect was given by one only SNP, which could therefore be taken as the QTL itself.

Conclusions

In this study we proposed a simple method to identify genomic regions associated to the phenotype under study. The identification of the DNA region is the first step to search for the mutation which is really responsible for the trait variability, through the direct sequencing of the genome regions that encode the QTL.

Background

The recent availability of genome-wide SNP panels, which offered the opportunity to evaluate the variation in SNP allele frequencies between populations, allowed the successful finding of genomic regions subject to positive selection in human and cattle [1–5]. For the identification of selection sweeps for milk traits, efficient application of the selective genotyping strategy for QTL mapping has been reported in dairy cattle [6], swine [7] and sheep [8]. In these cases, the extreme divergent individuals for a trait (the two tails of the distribution) are chosen and genotyped. Boligon et al. [9] compared selective genotyping strategies for prediction of breeding values in a population undergoing selection, and concluded that animals with extreme yield deviation values in a reference population are the most informative when training genomic selection models. Using the selective genotyping approach, Moioli et al. [8] identified two novel non-synonymous mutations associated with milk yield in sheep, and demonstrated their effect also in independent populations.

In the present study, we hypothesized that selection sweeps, detected in a simulated population, were useful to map QTLs for the trait under selection in the whole population.

Materials and methods

Dataset

Three milk production traits were simulated in a population of 3,000 females, included in a data set of 4,100 individuals of 4 different generations (G0 to G4) having known pedigree. Females and parental genotypes at 10,000 SNPs equally distributed on 5 chromosomes were available. A detailed description of the population is reported by Usai et al. [10].

Statistical analysis

The selective genotyping technique was simulated on the females of generation 3 (1000 females), assuming that they were those who had better profited of the selection. Their production was reported on table 1. Allele frequencies at each SNP of each chromosome were calculated separately for the group the production of which was <-1 st dev for each trait, and the group the production of which was >1 st dev for each trait. The number of the animals of each group was also reported in table 1. The QTLs so hypothesized might be affected by the number of individuals included in the production tails, this depending on the additive-relationship between them, which might not represent the average relationship of the whole population. Habier et al. [11], in the context of predicting genomic breeding values (GEBV), advised that additive-genetic relationships between the training individuals and a selection candidate, captured by SNPs, affects the GEBV accuracy of that candidate. Therefore, in the present study, coefficient of relationship between the individuals of each tail portion, as well as the whole population were calculated as in Wright [12] using Proc Inbreeding in SAS [13].

Table 1 Statistical parameters relevant to the analyzed traits in the female population of generation 3

The QTL effect was subsequently estimated with the use of sliding windows, composed of five consecutive SNPs and calculated for each of the five chromosomes. The number of markers in each window was established based on the consideration that the SNP density of the simulated population of the present study was similar to the average SNP density of the cattle panel used by Stella et al. [2]. These authors suggested that sliding windows of 5, 9, and 19 SNPs respectively give similar results when searching for selective sweeps in cattle.

For each window, the sum of the differences (in absolute value) of the allele frequencies, at each SNP, between the two productive groups, was calculated; the sliding windows were then ranked, according to this parameter, within each chromosome. We arbitrarily hypothesized that the potential QTL, for the considered trait, was located in the top ranking window. Because the selective genotyping was performed separately for the three traits, the potential QTLs could be located in different windows; for this reason, more than one window in the same chromosome were considered in the subsequent analyses.

The top ranking sliding windows, encoding the hypothesized QTL, as well as the potentially affected traits, are reported in table 2.

Table 2 Top ranking sliding windows based on the highest difference in allelic frequencies between the two productive groups, separately for each trait

Estimation of the QTL effect for the whole window of 5 SNP

The QTL effect was calculated on the whole recorded population as follows.

For each sliding window, the most probable haplotype alleles were calculated using the EM algorithm [14], through Proc Haplotype in SAS [13], and were assigned to each phenotyped individual (n = 3000).

For each haplotype allele showing allele frequency ≥ .07 in the recorded population, the allelic substitution effect was estimated as a covariate on each trait, as in Sherman et al. [15], with the following model:

y = b(haplotype allele) + e

Where y = trait1, trait2 and trait3

Alleles were coded as follows: 2 copies of the same allele = 2; one copy = 1; no copy = 0.

To account for multiple testing, the corrected probability of the effect was estimated using the False Discovery Rate test with Proc Multtest in SAS [13].

Estimation of the SNP effect from the haplotype effect

Under the hypothesis that one SNP of each haplotype was expected to have a major effect on the recorded trait, direct observation of those haplotype alleles that showed a highly significant effect (P < .00001) on one trait allowed to select one SNP where the two alleles showed opposite effects on that trait. For each of those SNPs, the substitution allelic effect was estimated as a covariate on each trait, similarly and with the same model as for the estimation of the allele haplotype effect.

Results

Because the selective genotyping strategy was performed separately for the three traits, the statistically significant windows varied depending on the considered trait (Table 2).

The average additive relationship values of each of the selected tails, for each trait, were very similar to each other's (Table 3), ranging from 4.26 to 4.37 %; but they were higher than the corresponding value calculated for the whole population (3.01%). For all tested haplotypes, the corrected probabilities, after consideration of the FDR, of the allelic substitution effects, were reported in table 4.

Table 3 Average relationships in the selected groups of animals and in the whole population
Table 4 Haplotype effects.

Through direct observation of those haplotype alleles that showed a significant effect on one trait, it was possible to make evident which SNP, within the haplotype allele, might have been directly responsible of the trait variability. In Table 5 only the SNPs that presented a highly significant (P < .0001) allelic substitution effect were reported. These SNPs, located on chromosomes 1, 3 and 4 might be themselves considered the QTLs influencing the relevant trait.

Table 5 Effect of allele 1 of the SNP with major effect on each trait.

Discussion

In this study, two assumptions were arbitrarily made. The first was that the selective genotyping strategy was successful for QTL mapping. Although the literature reported evidence of the suitability of this strategy [9], the decision to what animals should be considered as highly divergent for each trait was a choice of the authors. Therefore, the results obtained, both in numbers and in the position of the QTLs, might have been different if more or less restrictive parameters had been chosen. The additive relationship values of each of the selected tails, for each trait, were very similar to each other's, ranging from 4.26 to 4.37 %; but they were higher than the corresponding value calculated for the whole population (3.01%). To appraise the extent of the difference in the average relationship between the tails and the whole population, it is useful to cite Vahlsten et al. [16] who reported that an increase by 0.96 % units of relationship, per generation, is to be considered slow, this value referring to Friesian bulls, born during 40 years, and belonging to a population of over 400,000 animals. It can therefore be inferred that the relationship differences observed in the present study reproduce the mere generational trend.

The second assumption was that the QTL was encoded by an haplotype of 5 consecutive SNPs. Weller and Ron [17] underlined how important is the extent of LD in the application of genome scans to breeding programs. These authors noted that population-wide linkage LD extends, in dairy cattle, over less than 1 cM, i.e. a much shorter extent than the genetic linkage within families, that extends over tens of centimorgans. It is therefore possible that the hypothesis that the QTL was encoded by the haplotype with the highest effect on each trait was not the most appropriate for this study, the analyzed population consisting in a simulated sample. However, because the sliding windows encompass consecutive markers, the choice to select the top ranking window for each trait seemed appropriate, because it allowed the identification of single SNPs (Table 5) having a very high significant effect on one trait, the probability for some of them being < .1.0E-16.

Conclusions

In this study we proposed a simple method to identify genomic regions associated to the phenotype under study, regions that could therefore be taken into account as the potential QTLs. The identification of the DNA region is the first step to identify the mutation which is really responsible for the variability of the trait, through the direct sequencing of the genomic regions that encode the QTL. The precision of the QTL estimation can vary depending on the deviations values established in the reference population to define which animals are extremely divergent.

References

  1. Qanbari S, Pimentel ECG, Tetens J, Thaller G, Lichtner P, Sharifi AR, Simianer H: A genome-wide scan for signatures of recent selection in Holstein cattle. Anim Genet. 2010, 41 (4): 377-89.

    PubMed  CAS  Google Scholar 

  2. Stella A, Ajmone-Marsan P, Lazzari B, Boettcher P: Identification of selection signatures in cattle breeds selected for dairy production. Genetics. 2010, 185: 1451-1461. 10.1534/genetics.110.116111.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  3. Sabeti PC, Schaffner SF, Fry B, Lohmueller J, Varilly P, Shamovsky O, Palma A, Mikkelsen TS, Altshuler D, Lande ES: Positive natural selection in the human lineage. Science. 2006, 3129 (5780): 1614-1620.

    Article  Google Scholar 

  4. Akey JM, Zhang G, Zhang K, Jin L, Shriver MD: Interrogating a high density SNP map for signatures of natural selection. Genome Res. 2002, 12 (12): 1805-14. 10.1101/gr.631202.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  5. Bongiorni S, Mancini G, Chillemi G, Pariset L, Valentini A: Identificaiton of a short region on chromosome 6 affecting direct calving ease in Piedmontese cattle breed. Plos ONE. 2012, 7: e50137-10.1371/journal.pone.0050137.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  6. Bagnato A, Schiavini F, Rossoni A, Maltecca C, Dolezal M, Medugorac I, Sölkner J, Russo V, Fontanesi L, Friedmann A, Soller M, Lipkin E: Quantitative trait loci affecting milk yield and protein percentage in a three-country Brown Swiss population. J Dairy Sci. 2008, 91: 767-783. 10.3168/jds.2007-0507.

    Article  PubMed  CAS  Google Scholar 

  7. Fontanesi L, Scotti E, Speroni C, Buttazzoni L, Russo V: A selective genotyping approach identifies single nucleotide polymorphisms in porcine chromosome 2 genes associated with production and carcass traits in Italian heavy pigs. It J Anim Sci. 2011, 10: e15-

    Google Scholar 

  8. Moioli B, Scatà MC, Steri R, Napolitano F, Catillo G: Signatures of selection identify loci associated with milk yield in sheep. BMC Genetics. 2013, 14: 76-

    Article  PubMed  PubMed Central  Google Scholar 

  9. Boligon A, Long N, Albuquerque L G, Weigel KA, Gianola D, Rosa JGM: Comparison of selective genotyping strategies for prediction of breeding values in a population undergoing selection. J Anim Sci. 2012, 90 (13): 4716-4722. 10.2527/jas.2012-4857.

    Article  PubMed  CAS  Google Scholar 

  10. Usai MG, Gaspa G, Carta A, Macciotta NPP, Casu S: BMC Genetics present issue.

  11. Habier S, Tetens J, Seefried FR, Lichtner P, Thaller G: The impact of genetic relationship information on genomic breeding values in German Holstein cattle. Gen Sel Evol. 2010, 42: 5-10.1186/1297-9686-42-5.

    Article  Google Scholar 

  12. Wright S: Coefficient of Breeding and Relationship. Amer Nat. 1922, 56: 330-338. 10.1086/279872.

    Article  Google Scholar 

  13. SAS Institute: SAS/STAT User's Guide. 2007, Version 9.1. SAS Institute Inc., Cary, NC

    Google Scholar 

  14. Excoffier L, Slatkin M: Maximum-Likelihood Estimation of Molecular Haplotype Frequencies in a Diploid Population. Molecular Biology and Evolution. 1995, 12: 921-927.

    PubMed  CAS  Google Scholar 

  15. Sherman ELJ, Nkrumah D, Murdoch BM, Li C, Wang Z, Fu A, Moore S: Polymorphisms and haplotypes in the bovine NPY, GHR, GHRL, IGF2, UCP2, and UCP3 genes and their associations with measures of growth, performance, feed efficiency and carcass merit in beef cattle. J Anim Sci. 2008, 86: 1-16. 10.2527/jas.2007-0687.

    Article  PubMed  CAS  Google Scholar 

  16. Vahlsten T, Mantysaari E, Stranden I: Coefficient of relationship and inbreeding among Finnish Ayrshire and Holstein Friesian. Agr Food Sci. 2004, 13: 338-347. 10.2137/1239099043633350.

    Article  Google Scholar 

  17. Weller J, Ron A: Invited review: quantitative trait nucleotide determination in the era of genomic selection. J Dairy Sci. 2011, 94 (3): 1082-90. 10.3168/jds.2010-3793.

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgements

This study is part of the GENZOOT research program, funded by the Italian Ministry of Agriculture (Rome, Italy).

Declarations

This study is part of the GENZOOT research program, funded by the Italian Ministry of Agriculture (Rome, Italy).

This article has been published as part of BMC Proceedings Volume 8 Supplement 5, 2014: Proceedings of the 16th European Workshop on QTL Mapping and Marker Assisted Selection (QTL-MAS). The full contents of the supplement are available online at http://www.biomedcentral.com/bmcproc/supplements/8/S5

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bianca Moioli.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

BM planned the study and applied the procedures to set up the sliding windows to be used in the subsequent analyses. GC and FN performed the statistical analysis of association providing the estimation of the QTL effects. All authors have contributed to the editing of the article, and approved the final manuscript.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Moioli, B., Napolitano, F. & Catillo, G. A selective genotyping approach identifies QTL in a simulated population. BMC Proc 8 (Suppl 5), S5 (2014). https://doi.org/10.1186/1753-6561-8-S5-S5

Download citation

  • Published:

  • DOI: https://doi.org/10.1186/1753-6561-8-S5-S5

Keywords