Outline of the WACE algorithm. For a given gene expression dataset, the samples are classified into two groups based on phenotypes such as poor versus good outcome and the genes are ordered based on their physical location on chromosomes. Expression scores (ES, t-statistics) for all the genes are computed and then subjected to wavelet transform to obtain smoothed ES, called neighboring score (NS). The significance (false discovery rate, FDR) of NS on each individual chromosome is empirically approximated based on its null distribution by performing the same wavelet transform on "random" ES's based on the randomized samples. A segment containing at least n consecutive positive/negative NS with FDR ≤ 0.01 is defined as an inferred CNV region. ICNV regions from multiple datasets are finally aligned to determine the recurrent regions of CNV.
Tran et al. BMC Systems Biology 2011 5:121 doi:10.1186/1752-0509-5-121