Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

Bayesian model to detect phenotype-specific genes for copy number data

Juan R González124*, Carlos Abellán3 and Juan J Abellán34

Author Affiliations

1 Center for Research in Environmental Epidemiology (CREAL), Barcelona, Spain

2 Institut Municipal d’Investigació Mèdica (IMIM), Barcelona, Spain

3 Joint Research Unit on Genomics and Health, Centre for Public Health Research (CSISP) and Cavanilles Institute for Biodiversity and Evolutionary Biology, University of Valencia, Valencia, Spain

4 CIBER Epidemiología y Salud Pública (CIBERESP), Spain

For all author emails, please log on.

BMC Bioinformatics 2012, 13:130  doi:10.1186/1471-2105-13-130

Published: 13 June 2012

Abstract

Background

An important question in genetic studies is to determine those genetic variants, in particular CNVs, that are specific to different groups of individuals. This could help in elucidating differences in disease predisposition and response to pharmaceutical treatments. We propose a Bayesian model designed to analyze thousands of copy number variants (CNVs) where only few of them are expected to be associated with a specific phenotype.

Results

The model is illustrated by analyzing three major human groups belonging to HapMap data. We also show how the model can be used to determine specific CNVs related to response to treatment in patients diagnosed with ovarian cancer. The model is also extended to address the problem of how to adjust for confounding covariates (e.g., population stratification). Through a simulation study, we show that the proposed model outperforms other approaches that are typically used to analyze this data when analyzing common copy-number polymorphisms (CNPs) or complex CNVs. We have developed an R package, called bayesGen, that implements the model and estimating algorithms.

Conclusions

Our proposed model is useful to discover specific genetic variants when different subgroups of individuals are analyzed. The model can address studies with or without control group. By integrating all data in a unique model we can obtain a list of genes that are associated with a given phenotype as well as a different list of genes that are shared among the different subtypes of cases.