BMC Bioinformatics

official impact factor 3.03

Open Access Methodology article

Detection of recurrent copy number alterations in the genome: taking among-subject heterogeneity seriously

Oscar M Rueda1,2 and Ramon Diaz-Uriarte1

Author Affiliations

1 Structural and Computational Biology Programme, Spanish National Cancer Centre (CNIO), Melchor Fernández Almagro 3, 28029 Madrid, Spain

2 Breast Cancer Functional Genomics, Cancer Research UK, Cambridge, UK

BMC Bioinformatics 2009, 10:308 doi:10.1186/1471-2105-10-308

Published: 23 September 2009

Abstract

Background

Alterations in the number of copies of genomic DNA that are common or recurrent among diseased individuals are likely to contain disease-critical genes. Unfortunately, defining common or recurrent copy number alteration (CNA) regions remains a challenge. Moreover, the heterogeneous nature of many diseases requires that we search for common or recurrent CNA regions that affect only some subsets of the samples (without knowledge of the regions and subsets affected), but this is neglected by most methods.

Results

We have developed two methods to define recurrent CNA regions from aCGH data. Our methods are unique and qualitatively different from existing approaches: they detect regions over both the complete set of arrays and alterations that are common only to some subsets of the samples (i.e., alterations that might characterize previously unknown groups); they use probabilities of alteration as input and return probabilities of being a common region, thus allowing researchers to modify thresholds as needed; the two parameters of the methods have an immediate, straightforward, biological interpretation. Using data from previous studies, we show that we can detect patterns that other methods miss and that researchers can modify, as needed, thresholds of immediate interpretability and develop custom statistics to answer specific research questions.

Conclusion

These methods represent a qualitative advance in the location of recurrent CNA regions, highlight the relevance of population heterogeneity for definitions of recurrence, and can facilitate the clustering of samples with respect to patterns of CNA. Ultimately, the methods developed can become important tools in the search for genomic regions harboring disease-critical genes.