This article is part of the supplement: Selected articles from the Eleventh Asia Pacific Bioinformatics Conference (APBC 2013): Bioinformatics
CoNVEX: copy number variation estimation in exome sequencing data using HMM
1 Department of Mechanical Engineering, University of Melbourne, Parkville, VIC 3010, Australia
2 Bioinformatics Core Facility, Peter MacCallum Cancer Centre, VIC 3002, Australia
BMC Bioinformatics 2013, 14(Suppl 2):S2 doi:10.1186/1471-2105-14-S2-S2Published: 21 January 2013
One of the main types of genetic variations in cancer is Copy Number Variations (CNV). Whole exome sequenicng (WES) is a popular alternative to whole genome sequencing (WGS) to study disease specific genomic variations. However, finding CNV in Cancer samples using WES data has not been fully explored.
We present a new method, called CoNVEX, to estimate copy number variation in whole exome sequencing data. It uses ratio of tumour and matched normal average read depths at each exonic region, to predict the copy gain or loss. The useful signal produced by WES data will be hindered by the intrinsic noise present in the data itself. This limits its capacity to be used as a highly reliable CNV detection source. Here, we propose a method that consists of discrete wavelet transform (DWT) to reduce noise. The identification of copy number gains/losses of each targeted region is performed by a Hidden Markov Model (HMM).
HMM is frequently used to identify CNV in data produced by various technologies including Array Comparative Genomic Hybridization (aCGH) and WGS. Here, we propose an HMM to detect CNV in cancer exome data. We used modified data from 1000 Genomes project to evaluate the performance of the proposed method. Using these data we have shown that CoNVEX outperforms the existing methods significantly in terms of precision. Overall, CoNVEX achieved a sensitivity of more than 92% and a precision of more than 50%.