A spatial simulation approach to account for protein structure when identifying non-random somatic mutations
1 Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
2 Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
3 Yale Center for Medical Informatics, Yale School of Medicine, New Haven, CT, USA
4 Department of Computer Science, Yale University, New Haven, CT, USA
5 Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, CT, USA
BMC Bioinformatics 2014, 15:231 doi:10.1186/1471-2105-15-231Published: 3 July 2014
Current research suggests that a small set of “driver” mutations are responsible for tumorigenesis while a larger body of “passenger” mutations occur in the tumor but do not progress the disease. Due to recent pharmacological successes in treating cancers caused by driver mutations, a variety of methodologies that attempt to identify such mutations have been developed. Based on the hypothesis that driver mutations tend to cluster in key regions of the protein, the development of cluster identification algorithms has become critical.
We have developed a novel methodology, SpacePAC (Spatial Protein Amino acid Clustering), that identifies mutational clustering by considering the protein tertiary structure directly in 3D space. By combining the mutational data in the Catalogue of Somatic Mutations in Cancer (COSMIC) and the spatial information in the Protein Data Bank (PDB), SpacePAC is able to identify novel mutation clusters in many proteins such as FGFR3 and CHRM2. In addition, SpacePAC is better able to localize the most significant mutational hotspots as demonstrated in the cases of BRAF and ALK. The R package is available on Bioconductor at: http://www.bioconductor.org/packages/release/bioc/html/SpacePAC.html webcite.
SpacePAC adds a valuable tool to the identification of mutational clusters while considering protein tertiary structure.