Prediction of gene–phenotype associations in humans, mice, and plants using phenologs
- Equal contributors
1 Center for Systems & Synthetic Biology, Institute for Cellular & Molecular Biology, The University of Texas at Austin, Austin, TX 78712, USA
2 Program in Computational and Applied Mathematics, The University of Texas at Austin, Austin, TX 78712, USA
3 Unit of Computational Medicine, Department of Medicine, Karolinska Institutet, Stockholm 171 76, Sweden
4 Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA
BMC Bioinformatics 2013, 14:203 doi:10.1186/1471-2105-14-203Published: 21 June 2013
Phenotypes and diseases may be related to seemingly dissimilar phenotypes in other species by means of the orthology of underlying genes. Such “orthologous phenotypes,” or “phenologs,” are examples of deep homology, and may be used to predict additional candidate disease genes.
In this work, we develop an unsupervised algorithm for ranking phenolog-based candidate disease genes through the integration of predictions from the k nearest neighbor phenologs, comparing classifiers and weighting functions by cross-validation. We also improve upon the original method by extending the theory to paralogous phenotypes. Our algorithm makes use of additional phenotype data — from chicken, zebrafish, and E. coli, as well as new datasets for C. elegans — establishing that several types of annotations may be treated as phenotypes. We demonstrate the use of our algorithm to predict novel candidate genes for human atrial fibrillation (such as HRH2, ATP4A, ATP4B, and HOPX) and epilepsy (e.g., PAX6 and NKX2-1). We suggest gene candidates for pharmacologically-induced seizures in mouse, solely based on orthologous phenotypes from E. coli. We also explore the prediction of plant gene–phenotype associations, as for the Arabidopsis response to vernalization phenotype.
We are able to rank gene predictions for a significant portion of the diseases in the Online Mendelian Inheritance in Man database. Additionally, our method suggests candidate genes for mammalian seizures based only on bacterial phenotypes and gene orthology. We demonstrate that phenotype information may come from diverse sources, including drug sensitivities, gene ontology biological processes, and in situ hybridization annotations. Finally, we offer testable candidates for a variety of human diseases, plant traits, and other classes of phenotypes across a wide array of species.