Open Access Highly Accessed Methodology article

A forward-backward fragment assembling algorithm for the identification of genomic amplification and deletion breakpoints using high-density single nucleotide polymorphism (SNP) array

Tianwei Yu1*, Hui Ye23, Wei Sun4, Ker-Chau Li4, Zugen Chen5, Sharoni Jacobs6, Dione K Bailey6, David T Wong7 and Xiaofeng Zhou28*

Author Affiliations

1 Department of Biostatistics, Rollins School of Public Health, Emory University, Atlanta, GA, USA

2 Center for Molecular Biology of Oral Diseases, College of Dentistry, University of Illinois at Chicago, Chicago, IL, USA

3 Shanghai Children's Medical Center, Shanghai Jiao-Tong University, Shanghai, China

4 Department of Statistics, University of California at Los Angeles, CA, USA

5 Department of Human Genetics & Microarray Core, University of California at Los Angeles, Los Angeles, CA, USA

6 Affymetrix, Inc., 3420 Central Expressway, Santa Clara, CA, USA

7 Dental Research Institute, School of Dentistry, David Geffen School of Medicine & Henry Samueli School of Engineering & Jonsson Comprehensive Cancer Center, University of California at Los Angeles, Los Angeles, CA, USA

8 Guanghua School & Research Institute of Stomatology, Sun Yat-Sen University, Guangzhou, China

For all author emails, please log on.

BMC Bioinformatics 2007, 8:145  doi:10.1186/1471-2105-8-145

Published: 3 May 2007



DNA copy number aberration (CNA) is one of the key characteristics of cancer cells. Recent studies demonstrated the feasibility of utilizing high density single nucleotide polymorphism (SNP) genotyping arrays to detect CNA. Compared with the two-color array-based comparative genomic hybridization (array-CGH), the SNP arrays offer much higher probe density and lower signal-to-noise ratio at the single SNP level. To accurately identify small segments of CNA from SNP array data, segmentation methods that are sensitive to CNA while resistant to noise are required.


We have developed a highly sensitive algorithm for the edge detection of copy number data which is especially suitable for the SNP array-based copy number data. The method consists of an over-sensitive edge-detection step and a test-based forward-backward edge selection step.


Using simulations constructed from real experimental data, the method shows high sensitivity and specificity in detecting small copy number changes in focused regions. The method is implemented in an R package FASeg, which includes data processing and visualization utilities, as well as libraries for processing Affymetrix SNP array data.