Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Proceedings of the Fourth Annual MCBIOS Conference. Computational Frontiers in Biomedicine

Open Access Proceedings

Analysis of nanopore detector measurements using Machine-Learning methods, with application to single-molecule kinetic analysis

Matthew Landry1 and Stephen Winters-Hilt12*

Author Affiliations

1 Department of Computer Science, University of New Orleans, New Orleans, LA, 70148, USA

2 The Research Institute for Children, 200 Henry Clay Ave., New Orleans, LA 70118, USA

For all author emails, please log on.

BMC Bioinformatics 2007, 8(Suppl 7):S12  doi:10.1186/1471-2105-8-S7-S12

Published: 1 November 2007

Abstract

Background

A nanopore detector has a nanometer-scale trans-membrane channel across which a potential difference is established, resulting in an ionic current through the channel in the pA-nA range. A distinctive channel current blockade signal is created as individually "captured" DNA molecules interact with the channel and modulate the channel's ionic current. The nanopore detector is sensitive enough that nearly identical DNA molecules can be classified with very high accuracy using machine learning techniques such as Hidden Markov Models (HMMs) and Support Vector Machines (SVMs).

Results

A non-standard implementation of an HMM, emission inversion, is used for improved classification. Additional features are considered for the feature vector employed by the SVM for classification as well: The addition of a single feature representing spike density is shown to notably improve classification results. Another, much larger, feature set expansion was studied (2500 additional features instead of 1), deriving from including all the HMM's transition probabilities. The expanded features can introduce redundant, noisy information (as well as diagnostic information) into the current feature set, and thus degrade classification performance. A hybrid Adaptive Boosting approach was used for feature selection to alleviate this problem.

Conclusion

The methods shown here, for more informed feature extraction, improve both classification and provide biologists and chemists with tools for obtaining a better understanding of the kinetic properties of molecules of interest.