KC-SMARTR: An R package for detection of statistically significant aberrations in multi-experiment aCGH data
1 Department of Bioinformatics and Statistics, The Netherlands Cancer Institute, Plesmanlaan 121, 1066CX Amsterdam, The Netherlands
2 Department of Molecular Biology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066CX Amsterdam, The Netherlands
3 Central Microarray Facility, The Netherlands Cancer Institute, Plesmanlaan 121, 1066CX Amsterdam, The Netherlands
4 Faculty of EEMCS, Delft University Of Technology, 2628 CD Delft, The Netherlands
BMC Research Notes 2010, 3:298 doi:10.1186/1756-0500-3-298Published: 11 November 2010
Most approaches used to find recurrent or differential DNA Copy Number Alterations (CNA) in array Comparative Genomic Hybridization (aCGH) data from groups of tumour samples depend on the discretization of the aCGH data to gain, loss or no-change states. This causes loss of valuable biological information in tumour samples, which are frequently heterogeneous. We have previously developed an algorithm, KC-SMART, that bases its estimate of the magnitude of the CNA at a given genomic location on kernel convolution (Klijn et al., 2008). This accounts for the intensity of the probe signal, its local genomic environment and the signal distribution across multiple samples.
Here we extend the approach to allow comparative analyses of two groups of samples and introduce the R implementation of these two approaches. The comparative module allows for a supervised analysis to be performed, to enable the identification of regions that are differentially aberrated between two user-defined classes.
We analyzed data from a series of B- and T-cell lymphomas and were able to retrieve all positive control regions (VDJ regions) in addition to a number of new regions. A t-test employing segmented data, that we implemented, was also able to locate all the positive control regions and a number of new regions but these regions were highly fragmented.
KC-SMARTR offers recurrent CNA and class specific CNA detection, at different genomic scales, in a single package without the need for additional segmentation. It is memory efficient and runs on a wide range of machines. Most importantly, it does not rely on data discretization and therefore maximally exploits the biological information in the aCGH data.