This article is part of the supplement: Proceedings of the Fourth Annual MCBIOS Conference. Computational Frontiers in Biomedicine
A novel, fast, HMM-with-Duration implementation – for application with a new, pattern recognition informed, nanopore detector
1 Dept. of Computer Science, University of New Orleans, New Orleans, LA 70148, USA
2 Research Institute for Children, Children's Hospital, New Orleans, LA 70118, USA
BMC Bioinformatics 2007, 8(Suppl 7):S19 doi:10.1186/1471-2105-8-S7-S19Published: 1 November 2007
Hidden Markov Models (HMMs) provide an excellent means for structure identification and feature extraction on stochastic sequential data. An HMM-with-Duration (HMMwD) is an HMM that can also exactly model the hidden-label length (recurrence) distributions – while the regular HMM will impose a best-fit geometric distribution in its modeling/representation.
A Novel, Fast, HMM-with-Duration (HMMwD) Implementation is presented, and experimental results are shown that demonstrate its performance on two-state synthetic data designed to model Nanopore Detector Data. The HMMwD experimental results are compared to (i) the ideal model and to (ii) the conventional HMM. Its accuracy is clearly an improvement over the standard HMM, and matches that of the ideal solution in many cases where the standard HMM does not. Computationally, the new HMMwD has all the speed advantages of the conventional (simpler) HMM implementation. In preliminary work shown here, HMM feature extraction is then used to establish the first pattern recognition-informed (PRI) sampling control of a Nanopore Detector Device (on a "live" data-stream).
The improved accuracy of the new HMMwD implementation, at the same order of computational cost as the standard HMM, is an important augmentation for applications in gene structure identification and channel current analysis, especially PRI sampling control, for example, where speed is essential. The PRI experiment was designed to inherit the high accuracy of the well characterized and distinctive blockades of the DNA hairpin molecules used as controls (or blockade "test-probes"). For this test set, the accuracy inherited is 99.9%.