Open Access Highly Accessed Research article

Assessment of algorithms for high throughput detection of genomic copy number variation in oligonucleotide microarray data

Ágnes Baross15, Allen D Delaney1, H Irene Li1, Tarun Nayar1, Stephane Flibotte1, Hong Qian1, Susanna Y Chan1, Jennifer Asano1, Adrian Ally1, Manqiu Cao2, Patricia Birch3, Mabel Brown-John1, Nicole Fernandes3, Anne Go1, Giulia Kennedy2, Sylvie Langlois3, Patrice Eydoux4, JM Friedman3 and Marco A Marra13*

Author Affiliations

1 Genome Sciences Centre, BC Cancer Agency, British Columbia Cancer Agency, Suite 100, 570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada

2 Affymetrix Inc., 3420 Central Expressway, Santa Clara, CA 95051, USA

3 Dept. of Medical Genetics, University of British Columbia, Children's & Women's Hospital, Box 153, 4500 Oak Street, Vancouver, BC, V6H 3N1, Canada

4 Dept. of Pathology and Laboratory Medicine, BC Children's Hospital,4480 Oak Street, Vancouver, BC, V6H 3N1, Canada

5 Genome British Columbia, 500-555 West 8th Avenue, Vancouver, BC, V5Z 1C6, Canada

For all author emails, please log on.

BMC Bioinformatics 2007, 8:368  doi:10.1186/1471-2105-8-368

Published: 2 October 2007



Genomic deletions and duplications are important in the pathogenesis of diseases, such as cancer and mental retardation, and have recently been shown to occur frequently in unaffected individuals as polymorphisms. Affymetrix GeneChip whole genome sampling analysis (WGSA) combined with 100 K single nucleotide polymorphism (SNP) genotyping arrays is one of several microarray-based approaches that are now being used to detect such structural genomic changes. The popularity of this technology and its associated open source data format have resulted in the development of an increasing number of software packages for the analysis of copy number changes using these SNP arrays.


We evaluated four publicly available software packages for high throughput copy number analysis using synthetic and empirical 100 K SNP array data sets, the latter obtained from 107 mental retardation (MR) patients and their unaffected parents and siblings. We evaluated the software with regards to overall suitability for high-throughput 100 K SNP array data analysis, as well as effectiveness of normalization, scaling with various reference sets and feature extraction, as well as true and false positive rates of genomic copy number variant (CNV) detection.


We observed considerable variation among the numbers and types of candidate CNVs detected by different analysis approaches, and found that multiple programs were needed to find all real aberrations in our test set. The frequency of false positive deletions was substantial, but could be greatly reduced by using the SNP genotype information to confirm loss of heterozygosity.