Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

This article is part of the supplement: Selected articles from the Eleventh Asia Pacific Bioinformatics Conference (APBC 2013): Genomics

Open Access Proceedings

A probabilistic method for identifying rare variants underlying complex traits

Jiayin Wang2*, Zhongmeng Zhao1*, Zhi Cao1, Aiyuan Yang1 and Jin Zhang2

Author affiliations

1 Department of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, P.R.China

2 Computer Science and Engineering Department, University of Connecticut, Storrs, Connecticut 06269-2155, USA

For all author emails, please log on.

Citation and License

BMC Genomics 2013, 14(Suppl 1):S11  doi:10.1186/1471-2164-14-S1-S11

Published: 21 January 2013

Abstract

Background

Identifying the genetic variants that contribute to disease susceptibilities is important both for developing methodologies and for studying complex diseases in molecular biology. It has been demonstrated that the spectrum of minor allelic frequencies (MAFs) of risk genetic variants ranges from common to rare. Although association studies are shifting to incorporate rare variants (RVs) affecting complex traits, existing approaches do not show a high degree of success, and more efforts should be considered.

Results

In this article, we focus on detecting associations between multiple rare variants and traits. Similar to RareCover, a widely used approach, we assume that variants located close to each other tend to have similar impacts on traits. Therefore, we introduce elevated regions and background regions, where the elevated regions are considered to have a higher chance of harboring causal variants. We propose a hidden Markov random field (HMRF) model to select a set of rare variants that potentially underlie the phenotype, and then, a statistical test is applied. Thus, the association analysis can be achieved without pre-selection by experts. In our model, each variant has two hidden states that represent the causal/non-causal status and the region status. In addition, two Bayesian processes are used to compare and estimate the genotype, phenotype and model parameters. We compare our approach to the three current methods using different types of datasets, and though these are simulation experiments, our approach has higher statistical power than the other methods. The software package, RareProb and the simulation datasets are available at: http://www.engr.uconn.edu/~jiw09003 webcite.