Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Open Badges Methodology article

Statistical method on nonrandom clustering with application to somatic mutations in cancer

Jingjing Ye1*, Adam Pavlicek2, Elizabeth A Lunney2, Paul A Rejto2 and Chi-Hse Teng13*

Author affiliations

1 Global Pre-Clinical Statistics, Pfizer Global Research and Development, 10777 Science Center Drive, San Diego, CA, 92121, USA

2 Computational Biology Group, Oncology Research Unit, Pfizer Global Research and Development, San Diego, CA, 92121, USA

3 Statistics, Corporate Analytics, Amylin Pharmaceuticals Inc, 9360 Towne Centre Drive, San Diego, CA, 92121, USA

For all author emails, please log on.

Citation and License

BMC Bioinformatics 2010, 11:11  doi:10.1186/1471-2105-11-11

Published: 7 January 2010



Human cancer is caused by the accumulation of tumor-specific mutations in oncogenes and tumor suppressors that confer a selective growth advantage to cells. As a consequence of genomic instability and high levels of proliferation, many passenger mutations that do not contribute to the cancer phenotype arise alongside mutations that drive oncogenesis. While several approaches have been developed to separate driver mutations from passengers, few approaches can specifically identify activating driver mutations in oncogenes, which are more amenable for pharmacological intervention.


We propose a new statistical method for detecting activating mutations in cancer by identifying nonrandom clusters of amino acid mutations in protein sequences. A probability model is derived using order statistics assuming that the location of amino acid mutations on a protein follows a uniform distribution. Our statistical measure is the differences between pair-wise order statistics, which is equivalent to the size of an amino acid mutation cluster, and the probabilities are derived from exact and approximate distributions of the statistical measure. Using data in the Catalog of Somatic Mutations in Cancer (COSMIC) database, we have demonstrated that our method detects well-known clusters of activating mutations in KRAS, BRAF, PI3K, and β-catenin. The method can also identify new cancer targets as well as gain-of-function mutations in tumor suppressors.


Our proposed method is useful to discover activating driver mutations in cancer by identifying nonrandom clusters of somatic amino acid mutations in protein sequences.