Open Access Highly Accessed Research article

Estimation of allele frequency and association mapping using next-generation sequencing data

Su Yeon Kim1*, Kirk E Lohmueller1, Anders Albrechtsen2, Yingrui Li3, Thorfinn Korneliussen4, Geng Tian356, Niels Grarup7, Tao Jiang3, Gitte Andersen8, Daniel Witte9, Torben Jorgensen1011, Torben Hansen127, Oluf Pedersen131478, Jun Wang34 and Rasmus Nielsen14

Author Affiliations

1 Departments of Integrative Biology and Statistics, UC Berkeley, Berkeley CA 94720, USA

2 Bioinformatics Centre, University of Copenhagen, Copenhagen, Denmark

3 Beijing Genomics Institute, Shenzhen 518083, China

4 Department of Biology, University of Copenhagen, Copenhagen, Denmark

5 Beijing Institute of Genomics, Chinese Academy of Science, Beijing 101300, China

6 The Graduate University of Chinese Academy of Sciences, Beijing 100062, China

7 Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark

8 Hagedorn Research Institute, Copenhagen, Denmark

9 Steno Diabetes Center, Gentofte, Denmark

10 Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark

11 Research Centre for Prevention and Health, Glostrup University Hospital, Glostrup, Denmark

12 Faculty of Health Sciences, University of Southern Denmark, Odense, Denmark

13 Faculty of Health Sciences, University of Aarhus, Aarhus, Denmark

14 Institute of Biomedical Sciences, University of Copenhagen, Copenhagen, Denmark

For all author emails, please log on.

BMC Bioinformatics 2011, 12:231  doi:10.1186/1471-2105-12-231

Published: 11 June 2011

Additional files

Additional file 1:

Boxplot of estimated MAFs using ML methods with known or unknown minor allele. Boxplot of estimated MAFs of SNPs corresponding to each sample allele frequency. Assuming 1,000 individuals, 1,000 SNPs with true MAF of 0.5% were simulated at individual sequencing depth of 8X. For each SNP, sample allele frequency was obtained using true genotypes (x-axis). Then each boxplot was drawn using estimated MAFs with known (left) and unknown(right) minor alleles.

Format: PDF Size: 21KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 2:

QQ-plot comparing the null distribution of the Armitage trend test statistic with a χ2(1) distribution. QQ-plots comparing the null distribution of the test statistic of interest with a χ2(1) distribution. The first three columns correspond to the Armitage trend test statistic computed using the true genotypes (True), called genotypes without filtering (Call NF), and called genotypes with filtering (Call F), respectively. The fourth column corresponds to the likelihood ratio test statistic with unknown minor allele (LRT). Assuming 500 cases and 500 controls, under the null hypothesis, a set of 5,000 sites were simulated with a MAF of 5% with a sequencing depth of 2× (upper panels) and 5× (lower panels). The "Inflation" factor [44] is shown in the upper left corner of each figure.

Format: PDF Size: 2.4MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 3:

Receiver operating characteristic curve of the Armitage trend test. Receiver operating characteristic (ROC) curves of four tests of association. For the definition of the four statistics, see the caption of Additional file 2. Assuming 500 cases and 500 controls, a set of 20,000 sites were simulated under the null and under the alternative at individual sequencing depths of 2×, 5×, and 10× (three columns). At each false positive rate (x-axis), the corresponding critical value was computed using the empirical null distribution. The true positive rate (power; y-axis) was obtained by computing the fraction of causative sites with test statistics that exceed the critical value.

Format: PDF Size: 53KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 4:

Estimates of type-specific sequencing error rates. Type-specific sequencing error rates estimated from 200 exomes [42] using our models (Equation 8).

Format: PDF Size: 4KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 5:

Manual of our programs: simreseq and testassoc. Manual of our programs: simreseq and testassoc.

Format: PDF Size: 179KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 6:

Source code of our programs: simreseq and testassoc. All the source code of our programs used for the simulation studies, estimation of parameters, and tests of association.

Format: GZ Size: 75KB Download file

Open Data