This article is part of the supplement: Selected articles from the Eleventh Asia Pacific Bioinformatics Conference (APBC 2013): Genomics
Inferring evolution of gene duplicates using probabilistic models and nonparametric belief propagation
1 School of Computer Science and Technology, Soochow University, Suzhou 215006, China
2 Shanghai Key Laboratory of Intelligent Information Processing, China
3 Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
BMC Genomics 2013, 14(Suppl 1):S15 doi:10.1186/1471-2164-14-S1-S15Published: 21 January 2013
Gene duplication, followed by functional evolution of duplicate genes, is a primary engine of evolutionary innovation. In turn, gene expression evolution is a critical component of overall functional evolution of paralogs. Inferring evolutionary history of gene expression among paralogs is therefore a problem of considerable interest. It also represents significant challenges. The standard approaches of evolutionary reconstruction assume that at an internal node of the duplication tree, the two duplicates evolve independently. However, because of various selection pressures functional evolution of the two paralogs may be coupled. The coupling of paralog evolution corresponds to three major fates of gene duplicates: subfunctionalization (SF), conserved function (CF) or neofunctionalization (NF). Quantitative analysis of these fates is of great interest and clearly influences evolutionary inference of expression. These two interrelated problems of inferring gene expression and evolutionary fates of gene duplicates have not been studied together previously and motivate the present study.
Here we propose a novel probabilistic framework and algorithm to simultaneously infer (i) ancestral gene expression and (ii) the likely fate (SF, NF, CF) at each duplication event during the evolution of gene family. Using tissue-specific gene expression data, we develop a nonparametric belief propagation (NBP) algorithm to predict the ancestral expression level as a proxy for function, and describe a novel probabilistic model that relates the predicted and known expression levels to the possible evolutionary fates. We validate our model using simulation and then apply it to a genome-wide set of gene duplicates in human.
Our results suggest that SF tends to be more frequent at the earlier stage of gene family expansion, while NF occurs more frequently later on.