Open Access Research article

Classification of human cancers based on DNA copy number amplification modeling

Samuel Myllykangas1*, Jarkko Tikka2, Tom Böhling1, Sakari Knuutila1 and Jaakko Hollmén2*

Author Affiliations

1 Department of Pathology, Haartman Institute and HUSLAB, University of Helsinki and Helsinki University Central Hospital, P.O. Box 21, FI-00014, University of Helsinki, Helsinki, Finland

2 Department of Information and Computer Science, Helsinki University of Technology, P.O. Box 5400, FI-02015 TKK, Espoo, Finland

For all author emails, please log on.

BMC Medical Genomics 2008, 1:15  doi:10.1186/1755-8794-1-15

Published: 14 May 2008



DNA amplifications alter gene dosage in cancer genomes by multiplying the gene copy number. Amplifications are quintessential in a considerable number of advanced cancers of various anatomical locations. The aims of this study were to classify human cancers based on their amplification patterns, explore the biological and clinical fundamentals behind their amplification-pattern based classification, and understand the characteristics in human genomic architecture that associate with amplification mechanisms.


We applied a machine learning approach to model DNA copy number amplifications using a data set of binary amplification records at chromosome sub-band resolution from 4400 cases that represent 82 cancer types. Amplification data was fused with background data: clinical, histological and biological classifications, and cytogenetic annotations. Statistical hypothesis testing was used to mine associations between the data sets.


Probabilistic clustering of each chromosome identified 111 amplification models and divided the cancer cases into clusters. The distribution of classification terms in the amplification-model based clustering of cancer cases revealed cancer classes that were associated with specific DNA copy number amplification models. Amplification patterns – finite or bounded descriptions of the ranges of the amplifications in the chromosome – were extracted from the clustered data and expressed according to the original cytogenetic nomenclature. This was achieved by maximal frequent itemset mining using the cluster-specific data sets. The boundaries of amplification patterns were shown to be enriched with fragile sites, telomeres, centromeres, and light chromosome bands.


Our results demonstrate that amplifications are non-random chromosomal changes and specifically selected in tumor tissue microenvironment. Furthermore, statistical evidence showed that specific chromosomal features co-localize with amplification breakpoints and link them in the amplification process.