BMC Genomics

official impact factor 4.21

Open Access Research article

A new analysis tool for individual-level allele frequency for genomic studies

Hsin-Chou Yang1*, Hsin-Chi Lin1, Mei-Chu Huang1, Ling-Hui Li2, Wen-Harn Pan2, Jer-Yuarn Wu2 and Yuan-Tsong Chen2

Author Affiliations

1 Institute of Statistical Science, Academia Sinica, Taipei 115, Taiwan

2 Institute of Biomedical Sciences, Academia Sinica, Taipei 115, Taiwan

For all author emails, please log on.

BMC Genomics 2010, 11:415 doi:10.1186/1471-2164-11-415

Published: 5 July 2010

Abstract

Background

Allele frequency is one of the most important population indices and has been broadly applied to genetic/genomic studies. Estimation of allele frequency using genotypes is convenient but may lose data information and be sensitive to genotyping errors.

Results

This study utilizes a unified intensity-measuring approach to estimating individual-level allele frequencies for 1,104 and 1,270 samples genotyped with the single-nucleotide-polymorphism arrays of the Affymetrix Human Mapping 100K and 500K Sets, respectively. Allele frequencies of all samples are estimated and adjusted by coefficients of preferential amplification/hybridization (CPA), and large ethnicity-specific and cross-ethnicity databases of CPA and allele frequency are established. The results show that using the CPA significantly improves the accuracy of allele frequency estimates; moreover, this paramount factor is insensitive to the time of data acquisition, effect of laboratory site, type of gene chip, and phenotypic status. Based on accurate allele frequency estimates, analytic methods based on individual-level allele frequencies are developed and successfully applied to discover genomic patterns of allele frequencies, detect chromosomal abnormalities, classify sample groups, identify outlier samples, and estimate the purity of tumor samples. The methods are packaged into a new analysis tool, ALOHA (Allele-frequency/Loss-of-heterozygosity/Allele-imbalance).

Conclusions

This is the first time that these important genetic/genomic applications have been simultaneously conducted by the analyses of individual-level allele frequencies estimated by a unified intensity-measuring approach. We expect that additional practical applications for allele frequency analysis will be found. The developed databases and tools provide useful resources for human genome analysis via high-throughput single-nucleotide-polymorphism arrays. The ALOHA software was written in R and R GUI and can be downloaded at http://www.stat.sinica.edu.tw/hsinchou/genetics/aloha/ALOHA.htm webcite.