Open Access Methodology article

A scan statistic to extract causal gene clusters from case-control genome-wide rare CNV data

Takeshi Nishiyama12*, Kunihiko Takahashi3, Toshiro Tango34, Dalila Pinto5, Stephen W Scherer5, Satoshi Takami6 and Hirohisa Kishino6

Author Affiliations

1 Doctor of Public Health Program in Biostatistics, National Institute of Public Health, Wako, Saitama 351-0197, Japan

2 Clinical Trial Management Center, Nagoya City University Hospital, Nagoya 467-8601, Japan

3 Department of Technology Assessment and Biostatistics, National Institute of Public Health, Wako, Saitama 351-0197, Japan

4 Center for Medical Statistics, Tokyo 105-0021, Japan

5 The Center for Applied Genomics and Program in Genetics and Genomics Biology, Hospital for Sick Children, Toronto, Ontario M5G 1L7, Canada

6 Laboratory of Biometry and Bioinformatics, Graduate School of Agriculture and Life Sciences, University of Tokyo, Tokyo 113-8657, Japan

For all author emails, please log on.

BMC Bioinformatics 2011, 12:205  doi:10.1186/1471-2105-12-205

Published: 26 May 2011



Several statistical tests have been developed for analyzing genome-wide association data by incorporating gene pathway information in terms of gene sets. Using these methods, hundreds of gene sets are typically tested, and the tested gene sets often overlap. This overlapping greatly increases the probability of generating false positives, and the results obtained are difficult to interpret, particularly when many gene sets show statistical significance.


We propose a flexible statistical framework to circumvent these problems. Inspired by spatial scan statistics for detecting clustering of disease occurrence in the field of epidemiology, we developed a scan statistic to extract disease-associated gene clusters from a whole gene pathway. Extracting one or a few significant gene clusters from a global pathway limits the overall false positive probability, which results in increased statistical power, and facilitates the interpretation of test results. In the present study, we applied our method to genome-wide association data for rare copy-number variations, which have been strongly implicated in common diseases. Application of our method to a simulated dataset demonstrated the high accuracy of this method in detecting disease-associated gene clusters in a whole gene pathway.


The scan statistic approach proposed here shows a high level of accuracy in detecting gene clusters in a whole gene pathway. This study has provided a sound statistical framework for analyzing genome-wide rare CNV data by incorporating topological information on the gene pathway.