This article is part of the supplement: Selected Proceedings of Machine Learning in Systems Biology: MLSB 2007
Gene-based bin analysis of genome-wide association studies
1 Merck Serono International S.A., 9 chemin des Mines, 1202 Geneva, Switzerland
2 Epigenomics Project, Genopole®, 523 Terrasses de l'Agora, 91034 Évry cedex, France
BMC Proceedings 2008, 2(Suppl 4):S6 doi:Published: 17 December 2008
With the improvement of genotyping technologies and the exponentially growing number of available markers, case-control genome-wide association studies promise to be a key tool for investigation of complex diseases. However new analytical methods have to be developed to face the problems induced by this data scale-up, such as statistical multiple testing, data quality control and computational tractability.
We present a novel method to analyze genome-wide association studies results. The algorithm is based on a Bayesian model that integrates genotyping errors and genomic structure dependencies. p-values are assigned to genomic regions termed bins, which are defined from a gene-biased partitioning of the genome, and the false-discovery rate is estimated. We have applied this algorithm to data coming from three genome-wide association studies of Multiple Sclerosis.
The method practically overcomes the scale-up problems and permits to identify new putative regions statistically associated with the disease.