This article is part of the supplement: The 2010 International Conference on Bioinformatics and Computational Biology (BIOCOMP 2010): Genomics
A hidden Markov model-based algorithm for identifying tumour subtype using array CGH data
1 Department of Pathology, School of Medicine and Health Sciences, University of North Dakota, Grand Forks, ND 58201, USA
2 Exploratory Statistics, Global Pharmaceutical Research & Development, Abbott Laboratories, Abbott Park, IL 60064, USA
3 Department of Biochemistry and Molecular Biology, School of Medicine and Health Sciences, University of North Dakota, Grand Forks, ND 58201, USA
4 Department of Internal Medicine, Rush University Medical Center, Chicago, IL 60612, USA
Citation and License
BMC Genomics 2011, 12(Suppl 5):S10 doi:10.1186/1471-2164-12-S5-S10Published: 23 December 2011
The recent advancement in array CGH (aCGH) research has significantly improved tumor identification using DNA copy number data. A number of unsupervised learning methods have been proposed for clustering aCGH samples. Two of the major challenges for developing aCGH sample clustering are the high spatial correlation between aCGH markers and the low computing efficiency. A mixture hidden Markov model based algorithm was developed to address these two challenges.
The hidden Markov model (HMM) was used to model the spatial correlation between aCGH markers. A fast clustering algorithm was implemented and real data analysis on glioma aCGH data has shown that it converges to the optimal cluster rapidly and the computation time is proportional to the sample size. Simulation results showed that this HMM based clustering (HMMC) method has a substantially lower error rate than NMF clustering. The HMMC results for glioma data were significantly associated with clinical outcomes.
We have developed a fast clustering algorithm to identify tumor subtypes based on DNA copy number aberrations. The performance of the proposed HMMC method has been evaluated using both simulated and real aCGH data. The software for HMMC in both R and C++ is available in ND INBRE website http://ndinbre.org/programs/bioinformatics.php. webcite