This article is part of the supplement: Selected Proceedings of the First Summit on Translational Bioinformatics 2008
Phenotype forecasting with SNPs data through gene-based Bayesian networks
- Equal contributors
1 IRCCS Multimedica, Via Fantoli 16/15, I-20138 Milano, Italy
2 Department of Computer Engineering and Systems Science, University of Pavia, via Ferrata 1, I-27100, Pavia, Italy
BMC Bioinformatics 2009, 10(Suppl 2):S7 doi:10.1186/1471-2105-10-S2-S7Published: 5 February 2009
Bayesian networks are powerful instruments to learn genetic models from association studies data. They are able to derive the existing correlation between genetic markers and phenotypic traits and, at the same time, to find the relationships between the markers themselves. However, learning Bayesian networks is often non-trivial due to the high number of variables to be taken into account in the model with respect to the instances of the dataset. Therefore, it becomes very interesting to use an abstraction of the variable space that suitably reduces its dimensionality without losing information. In this paper we present a new strategy to achieve this goal by mapping the SNPs related to the same gene to one meta-variable. In order to assign states to the meta-variables we employ an approach based on classification trees.
We applied our approach to data coming from a genome-wide scan on 288 individuals affected by arterial hypertension and 271 nonagenarians without history of hypertension. After pre-processing, we focused on a subset of 24 SNPs. We compared the performance of the proposed approach with the Bayesian network learned with SNPs as variables and with the network learned with haplotypes as meta-variables. The results were obtained by running a hold-out experiment five times. The mean accuracy of the new method was 64.28%, while the mean accuracy of the SNPs network was 58.99% and the mean accuracy of the haplotype network was 54.57%.
The new approach presented in this paper is able to derive a gene-based predictive model based on SNPs data. Such model is more parsimonious than the one based on single SNPs, while preserving the capability of highlighting predictive SNPs configurations. The prediction performance of this approach was consistently superior to the SNP-based and the haplotype-based one in all the test sets of the evaluation procedure. The method can be then considered as an alternative way to analyze the data coming from association studies.