A graph theoretic approach to utilizing protein structure to identify non-random somatic mutations
1 Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
2 Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
3 Yale Center for Medical Informatics, Yale School of Medicine, New Haven, CT, USA
4 Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, CT, USA
BMC Bioinformatics 2014, 15:86 doi:10.1186/1471-2105-15-86Published: 26 March 2014
It is well known that the development of cancer is caused by the accumulation of somatic mutations within the genome. For oncogenes specifically, current research suggests that there is a small set of "driver" mutations that are primarily responsible for tumorigenesis. Further, due to recent pharmacological successes in treating these driver mutations and their resulting tumors, a variety of approaches have been developed to identify potential driver mutations using methods such as machine learning and mutational clustering. We propose a novel methodology that increases our power to identify mutational clusters by taking into account protein tertiary structure via a graph theoretical approach.
We have designed and implemented GraphPAC (Graph Protein Amino acid Clustering) to identify mutational clustering while considering protein spatial structure. Using GraphPAC, we are able to detect novel clusters in proteins that are known to exhibit mutation clustering as well as identify clusters in proteins without evidence of prior clustering based on current methods. Specifically, by utilizing the spatial information available in the Protein Data Bank (PDB) along with the mutational data in the Catalogue of Somatic Mutations in Cancer (COSMIC), GraphPAC identifies new mutational clusters in well known oncogenes such as EGFR and KRAS. Further, by utilizing graph theory to account for the tertiary structure, GraphPAC discovers clusters in DPP4, NRP1 and other proteins not identified by existing methods. The R package is available at: http://bioconductor.org/packages/release/bioc/html/GraphPAC.html webcite.
GraphPAC provides an alternative to iPAC and an extension to current methodology when identifying potential activating driver mutations by utilizing a graph theoretic approach when considering protein tertiary structure.