Open Access Methodology article

Inferring copy number and genotype in tumour exome data

Kaushalya C Amarasinghe1, Jason Li12, Sally M Hunter3, Georgina L Ryland3, Prue A Cowin4, Ian G Campbell356 and Saman K Halgamuge1*

Author Affiliations

1 Optimisation and Pattern Recognition group, Mechanical Engineering Department, Melbourne School of Engineering, The University of Melbourne, Parkville, Victoria 3010, Australia

2 Bioinformatics Core Facility, Peter MacCallum Cancer Centre, East Melbourne, Victoria 3002, Australia

3 Cancer Genetics Laboratory, Peter MacCallum Cancer Centre, East Melbourne, Victoria 3002, Australia

4 Cancer Genomics and Genetics Laboratory, Peter MacCallum Cancer Centre, East Melbourne, Victoria 3002, Australia

5 Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, Victoria 3010, Australia

6 Department of Pathology, The University of Melbourne, Parkville, Victoria 3010, Australia

For all author emails, please log on.

BMC Genomics 2014, 15:732  doi:10.1186/1471-2164-15-732

Published: 28 August 2014

Abstract

Background

Using whole exome sequencing to predict aberrations in tumours is a cost effective alternative to whole genome sequencing, however is predominantly used for variant detection and infrequently utilised for detection of somatic copy number variation.

Results

We propose a new method to infer copy number and genotypes using whole exome data from paired tumour/normal samples. Our algorithm uses two Hidden Markov Models to predict copy number and genotypes and computationally resolves polyploidy/aneuploidy, normal cell contamination and signal baseline shift. Our method makes explicit detection on chromosome arm level events, which are commonly found in tumour samples. The methods are combined into a package named ADTEx (Aberration Detection in Tumour Exome). We applied our algorithm to a cohort of 17 in-house generated and 18 TCGA paired ovarian cancer/normal exomes and evaluated the performance by comparing against the copy number variations and genotypes predicted using Affymetrix SNP 6.0 data of the same samples. Further, we carried out a comparison study to show that ADTEx outperformed its competitors in terms of precision and F-measure.

Conclusions

Our proposed method, ADTEx, uses both depth of coverage ratios and B allele frequencies calculated from whole exome sequencing data, to predict copy number variations along with their genotypes. ADTEx is implemented as a user friendly software package using Python and R statistical language. Source code and sample data are freely available under GNU license (GPLv3) at http://adtex.sourceforge.net/ webcite.