Log on / register
Feedback | Support | My details
Open AccessHighly AccessResearch article

Assessment of algorithms for high throughput detection of genomic copy number variation in oligonucleotide microarray data

Ágnes Baross1,5 email, Allen D Delaney1 email, H Irene Li1 email, Tarun Nayar1 email, Stephane Flibotte1 email, Hong Qian1 email, Susanna Y Chan1 email, Jennifer Asano1 email, Adrian Ally1 email, Manqiu Cao2 email, Patricia Birch3 email, Mabel Brown-John1 email, Nicole Fernandes3 email, Anne Go1 email, Giulia Kennedy2 email, Sylvie Langlois3 email, Patrice Eydoux4 email, JM Friedman3 email and Marco A Marra1,3 email

1Genome Sciences Centre, BC Cancer Agency, British Columbia Cancer Agency, Suite 100, 570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada

2Affymetrix Inc., 3420 Central Expressway, Santa Clara, CA 95051, USA

3Dept. of Medical Genetics, University of British Columbia, Children's & Women's Hospital, Box 153, 4500 Oak Street, Vancouver, BC, V6H 3N1, Canada

4Dept. of Pathology and Laboratory Medicine, BC Children's Hospital,4480 Oak Street, Vancouver, BC, V6H 3N1, Canada

5Genome British Columbia, 500-555 West 8th Avenue, Vancouver, BC, V5Z 1C6, Canada

author email corresponding author email

BMC Bioinformatics 2007, 8:368doi:10.1186/1471-2105-8-368

Published: 2 October 2007

Abstract

Background

Genomic deletions and duplications are important in the pathogenesis of diseases, such as cancer and mental retardation, and have recently been shown to occur frequently in unaffected individuals as polymorphisms. Affymetrix GeneChip whole genome sampling analysis (WGSA) combined with 100 K single nucleotide polymorphism (SNP) genotyping arrays is one of several microarray-based approaches that are now being used to detect such structural genomic changes. The popularity of this technology and its associated open source data format have resulted in the development of an increasing number of software packages for the analysis of copy number changes using these SNP arrays.

Results

We evaluated four publicly available software packages for high throughput copy number analysis using synthetic and empirical 100 K SNP array data sets, the latter obtained from 107 mental retardation (MR) patients and their unaffected parents and siblings. We evaluated the software with regards to overall suitability for high-throughput 100 K SNP array data analysis, as well as effectiveness of normalization, scaling with various reference sets and feature extraction, as well as true and false positive rates of genomic copy number variant (CNV) detection.

Conclusion

We observed considerable variation among the numbers and types of candidate CNVs detected by different analysis approaches, and found that multiple programs were needed to find all real aberrations in our test set. The frequency of false positive deletions was substantial, but could be greatly reduced by using the SNP genotype information to confirm loss of heterozygosity.


© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.