Active subnetwork recovery with a mechanism-dependent scoring function; with application to angiogenesis and organogenesis studies
1 School of Information Technologies, University of Sydney, Sydney, NSW 2006, Australia
2 Sydney Emerging Infections and Biosecurity Institute, University of Sydney, Sydney, NSW 2006, Australia
3 Centre for Mathematical Biology, University of Sydney, Sydney, NSW 2006, Australia
4 NICTA, Australian Technology Park, Eveleigh, NSW 2015, Australia
5 Research School of Computer Science, Australian National University, Canberra, ACT, 0200, Australia
6 Vascular Biology Laboratory, Centenary Institute, Camperdown, NSW 2050, Australia
7 Sydney University Medical School, University of Sydney, NSW 2006, Australia
BMC Bioinformatics 2013, 14:59 doi:10.1186/1471-2105-14-59Published: 21 February 2013
The learning active subnetworks problem involves finding subnetworks of a bio-molecular network that are active in a particular condition. Many approaches integrate observation data (e.g., gene expression) with the network topology to find candidate subnetworks. Increasingly, pathway databases contain additional annotation information that can be mined to improve prediction accuracy, e.g., interaction mechanism (e.g., transcription, microRNA, cleavage) annotations. We introduce a mechanism-based approach to active subnetwork recovery which exploits such annotations. We suggest that neighboring interactions in a network tend to be co-activated in a way that depends on the “correlation” of their mechanism annotations. e.g., neighboring phosphorylation and de-phosphorylation interactions may be more likely to be co-activated than neighboring phosphorylation and covalent bonding interactions.
Our method iteratively learns the mechanism correlations and finds the most likely active subnetwork. We use a probabilistic graphical model with a Markov Random Field component which creates dependencies between the states (active or non-active) of neighboring interactions, that incorporates a mechanism-based component to the function. We apply a heuristic-based EM-based algorithm suitable for the problem. We validated our method’s performance using simulated data in networks downloaded from GeneGO against the same approach without the mechanism-based component, and two other existing methods. We validated our methods performance in correctly recovering (1) the true interaction states, and (2) global network properties of the original network against these other methods. We applied our method to networks generated from time-course gene expression studies in angiogenesis and lung organogenesis and validated the findings from a biological perspective against current literature.
The advantage of our mechanism-based approach is best seen in networks composed of connected regions with a large number of interactions annotated with a subset of mechanisms, e.g., a regulatory region of transcription interactions, or a cleavage cascade region. When applied to real datasets, our method recovered novel and biologically meaningful putative interactions, e.g., interactions from an integrin signaling pathway using the angiogenesis dataset, and a group of regulatory microRNA interactions in an organogenesis network.