This article is part of the supplement: Selected Proceedings of the First Summit on Translational Bioinformatics 2008
Identifying hypothetical genetic influences on complex disease phenotypes
- Equal contributors
1 Eastern Michigan University, Computer Science Department, Ypsilanti, MI 48197, USA
2 Department of Psychiatry, University of Michigan, Ann Arbor, MI 48109, USA
3 National Center for Integrative Biomedical Informatics, Ann Arbor, MI 48109, USA
BMC Bioinformatics 2009, 10(Suppl 2):S13 doi:10.1186/1471-2105-10-S2-S13Published: 5 February 2009
Statistical interactions between disease-associated loci of complex genetic diseases suggest that genes from these regions are involved in a common mechanism impacting, or impacted by, the disease. The computational problem we address is to discover relationships among genes from these interacting regions that may explain the observed statistical interaction and the role of these genes in the disease phenotype.
We describe a heuristic algorithm for generating hypothetical gene relationships from loci associated with a complex disease phenotype. This approach, called Prioritizing Disease Genes by Analysis of Common Elements (PDG-ACE), mines biomedical keywords from text descriptions of genes and uses them to relate genes close to disease-associated loci. A keyword common to, and significantly over-represented in, a pair of gene descriptions may represent a preliminary hypothesis about the biological relationship between the genes, and suggest the role the genes play in the disease phenotype.
Our experimentation shows that the approach finds previously published relationships, while failing to find relationships that don't exist. The results also indicate that the approach is robust to differences in keyword vocabulary. We outline a brief case study in which results from a recently published Type 2 Diabetes association study are used to identify potential hypotheses.