Open Access Methodology article

A spatial simulation approach to account for protein structure when identifying non-random somatic mutations

Gregory A Ryslik1*, Yuwei Cheng2, Kei-Hoi Cheung23, Robert D Bjornson4, Daniel Zelterman1, Yorgo Modis5 and Hongyu Zhao12*

Author Affiliations

1 Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA

2 Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA

3 Yale Center for Medical Informatics, Yale School of Medicine, New Haven, CT, USA

4 Department of Computer Science, Yale University, New Haven, CT, USA

5 Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, CT, USA

For all author emails, please log on.

BMC Bioinformatics 2014, 15:231  doi:10.1186/1471-2105-15-231

Published: 3 July 2014



Current research suggests that a small set of “driver” mutations are responsible for tumorigenesis while a larger body of “passenger” mutations occur in the tumor but do not progress the disease. Due to recent pharmacological successes in treating cancers caused by driver mutations, a variety of methodologies that attempt to identify such mutations have been developed. Based on the hypothesis that driver mutations tend to cluster in key regions of the protein, the development of cluster identification algorithms has become critical.


We have developed a novel methodology, SpacePAC (Spatial Protein Amino acid Clustering), that identifies mutational clustering by considering the protein tertiary structure directly in 3D space. By combining the mutational data in the Catalogue of Somatic Mutations in Cancer (COSMIC) and the spatial information in the Protein Data Bank (PDB), SpacePAC is able to identify novel mutation clusters in many proteins such as FGFR3 and CHRM2. In addition, SpacePAC is better able to localize the most significant mutational hotspots as demonstrated in the cases of BRAF and ALK. The R package is available on Bioconductor at: webcite.


SpacePAC adds a valuable tool to the identification of mutational clusters while considering protein tertiary structure.