Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

This article is part of the supplement: The 2010 International Conference on Bioinformatics and Computational Biology (BIOCOMP 2010): Genomics

Open Access Research article

A hidden Markov model-based algorithm for identifying tumour subtype using array CGH data

Ke Zhang1*, Yi Yang1, Viswanath Devanarayan2, Linglin Xie3, Youping Deng4 and Sens Donald1

Author Affiliations

1 Department of Pathology, School of Medicine and Health Sciences, University of North Dakota, Grand Forks, ND 58201, USA

2 Exploratory Statistics, Global Pharmaceutical Research & Development, Abbott Laboratories, Abbott Park, IL 60064, USA

3 Department of Biochemistry and Molecular Biology, School of Medicine and Health Sciences, University of North Dakota, Grand Forks, ND 58201, USA

4 Department of Internal Medicine, Rush University Medical Center, Chicago, IL 60612, USA

For all author emails, please log on.

BMC Genomics 2011, 12(Suppl 5):S10  doi:10.1186/1471-2164-12-S5-S10

Published: 23 December 2011

Abstract

Background

The recent advancement in array CGH (aCGH) research has significantly improved tumor identification using DNA copy number data. A number of unsupervised learning methods have been proposed for clustering aCGH samples. Two of the major challenges for developing aCGH sample clustering are the high spatial correlation between aCGH markers and the low computing efficiency. A mixture hidden Markov model based algorithm was developed to address these two challenges.

Results

The hidden Markov model (HMM) was used to model the spatial correlation between aCGH markers. A fast clustering algorithm was implemented and real data analysis on glioma aCGH data has shown that it converges to the optimal cluster rapidly and the computation time is proportional to the sample size. Simulation results showed that this HMM based clustering (HMMC) method has a substantially lower error rate than NMF clustering. The HMMC results for glioma data were significantly associated with clinical outcomes.

Conclusions

We have developed a fast clustering algorithm to identify tumor subtypes based on DNA copy number aberrations. The performance of the proposed HMMC method has been evaluated using both simulated and real aCGH data. The software for HMMC in both R and C++ is available in ND INBRE website http://ndinbre.org/programs/bioinformatics.php. webcite