Open Access Highly Accessed Research article

Assessment of the genomic variation in a cattle population by re-sequencing of key animals at low to medium coverage

Sandra Jansen1, Bernhard Aigner1, Hubert Pausch1, Michal Wysocki1, Sebastian Eck2, Anna Benet-Pagès2, Elisabeth Graf2, Thomas Wieland2, Tim M Strom23, Thomas Meitinger23 and Ruedi Fries1*

Author Affiliations

1 Chair of Animal Breeding, Technische Universität München, Liesel-Beckmann-Strasse 1, Freising, 85354, Germany

2 Institute of Human Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, 85764, Germany

3 Institute of Human Genetics, Technische Universität München, Munich, 81675, Germany

For all author emails, please log on.

BMC Genomics 2013, 14:446  doi:10.1186/1471-2164-14-446

Published: 4 July 2013

Abstract

Background

Genome- and population-wide re-sequencing would allow for most efficient detection of causal trait variants. However, despite a strong decrease of costs for next-generation sequencing in the last few years, re-sequencing of large numbers of individuals is not yet affordable. We therefore resorted to re-sequencing of a limited number of bovine animals selected to explain a major proportion of the population's genomic variation, so called key animals, in order to provide a catalogue of functional variants and a substrate for population- and genome-wide imputation of variable sites.

Results

Forty-three animals accounting for about 69 percent of the genetic diversity of the Fleckvieh population, a cattle breed of Southern Germany and Austria, were sequenced with coverages ranging from 4.17 to 24.98 and averaging 7.46. After alignment to the reference genome (UMD3.1) and multi-sample variant calling, more than 17 million variant positions were identified, about 90 percent biallelic single nucleotide variants (SNVs) and 10 percent short insertions and deletions (InDels). The comparison with high-density chip data revealed a sensitivity of at least 92 percent and a specificity of 81 percent for sequencing based genotyping, and 97 percent and 93 percent when a imputation step was included. There are 91,733 variants in coding regions of 18,444 genes, 46 percent being non-synonymous exchanges, of which 575 variants are predicted to cause premature stop codons. Three variants are listed in the OMIA database as causal for specific phenotypes.

Conclusions

Low- to medium-coverage re-sequencing of individuals explaining a major fraction of a population's genomic variation allows for the efficient and reliable detection of most variants. Imputation strongly improves genotype quality of lowly covered samples and thus enables maximum density genotyping by sequencing. The functional annotation of variants provides the basis for exhaustive genotype imputation in the population, e.g., for highest-resolution genome-wide association studies.

Keywords:
Next-generation sequencing; Low-coverage; Genotyping by sequencing; Variant annotation