Creating and analyzing pathway and protein interaction compendia for modelling signal transduction networks
1 Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
2 European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, UK, CB10 1SD
3 Immunology & Inflammation, Boehringer Ingelheim Pharmaceuticals, Ridgefield, CT, 06877-0368, USA
4 Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA
5 Merrimack Pharmaceuticals, Cambridge, MA, 02139, USA
Citation and License
BMC Systems Biology 2012, 6:29 doi:10.1186/1752-0509-6-29Published: 1 May 2012
Understanding the information-processing capabilities of signal transduction networks, how those networks are disrupted in disease, and rationally designing therapies to manipulate diseased states require systematic and accurate reconstruction of network topology. Data on networks central to human physiology, such as the inflammatory signalling networks analyzed here, are found in a multiplicity of on-line resources of pathway and interactome databases (Cancer CellMap, GeneGo, KEGG, NCI-Pathway Interactome Database (NCI-PID), PANTHER, Reactome, I2D, and STRING). We sought to determine whether these databases contain overlapping information and whether they can be used to construct high reliability prior knowledge networks for subsequent modeling of experimental data.
We have assembled an ensemble network from multiple on-line sources representing a significant portion of all machine-readable and reconcilable human knowledge on proteins and protein interactions involved in inflammation. This ensemble network has many features expected of complex signalling networks assembled from high-throughput data: a power law distribution of both node degree and edge annotations, and topological features of a “bow tie” architecture in which diverse pathways converge on a highly conserved set of enzymatic cascades focused around PI3K/AKT, MAPK/ERK, JAK/STAT, NFκB, and apoptotic signaling. Individual pathways exhibit “fuzzy” modularity that is statistically significant but still involving a majority of “cross-talk” interactions. However, we find that the most widely used pathway databases are highly inconsistent with respect to the actual constituents and interactions in this network. Using a set of growth factor signalling networks as examples (epidermal growth factor, transforming growth factor-beta, tumor necrosis factor, and wingless), we find a multiplicity of network topologies in which receptors couple to downstream components through myriad alternate paths. Many of these paths are inconsistent with well-established mechanistic features of signalling networks, such as a requirement for a transmembrane receptor in sensing extracellular ligands.
Wide inconsistencies among interaction databases, pathway annotations, and the numbers and identities of nodes associated with a given pathway pose a major challenge for deriving causal and mechanistic insight from network graphs. We speculate that these inconsistencies are at least partially attributable to cell, and context-specificity of cellular signal transduction, which is largely unaccounted for in available databases, but the absence of standardized vocabularies is an additional confounding factor. As a result of discrepant annotations, it is very difficult to identify biologically meaningful pathways from interactome networks a priori. However, by incorporating prior knowledge, it is possible to successively build out network complexity with high confidence from a simple linear signal transduction scaffold. Such reduced complexity networks appear suitable for use in mechanistic models while being richer and better justified than the simple linear pathways usually depicted in diagrams of signal transduction.