<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>1471-2105-7-S2-S22</ui>
	<ji>1471-2105</ji>
	<fm>
		<dochead>Proceedings</dochead>
		<bibl>
			<title>
				<p>Cheminformatics methods for novel nanopore analysis of HIV DNA termini</p>
			</title>
			<aug>
				<au id="A1" ca="yes">
					<snm>Winters-Hilt</snm>
					<fnm>Stephen</fnm>
					<insr iid="I1"/>
					<insr iid="I2"/>
					<email>winters@cs.uno.edu</email>
				</au>
				<au id="A2">
					<snm>Landry</snm>
					<fnm>Matthew</fnm>
					<insr iid="I1"/>
					<email>mlandry@cs.uno.edu</email>
				</au>
				<au id="A3">
					<snm>Akeson</snm>
					<fnm>Mark</fnm>
					<insr iid="I3"/>
					<email>makeson@chemistry.ucsc.edu</email>
				</au>
				<au id="A4">
					<snm>Tanase</snm>
					<fnm>Maria</fnm>
					<insr iid="I2"/>
					<email>metanase@yahoo.com</email>
				</au>
				<au id="A5">
					<snm>Amin</snm>
					<fnm>Iftekhar</fnm>
					<insr iid="I2"/>
					<email>iftekhar.amin@gmail.com</email>
				</au>
				<au id="A6">
					<snm>Coombs</snm>
					<fnm>Amy</fnm>
					<insr iid="I3"/>
					<email>acoombs@soe.ucsc.edu</email>
				</au>
				<au id="A7">
					<snm>Morales</snm>
					<fnm>Eric</fnm>
					<insr iid="I2"/>
					<email>emorales@chnola-research.org</email>
				</au>
				<au id="A8">
					<snm>Millet</snm>
					<fnm>John</fnm>
					<insr iid="I1"/>
					<email>millet.john@gmail.com</email>
				</au>
				<au id="A9">
					<snm>Baribault</snm>
					<fnm>Carl</fnm>
					<insr iid="I1"/>
					<email>cbaribau@uno.cs.edu</email>
				</au>
				<au id="A10">
					<snm>Sendamangalam</snm>
					<fnm>Srikanth</fnm>
					<insr iid="I1"/>
					<email>s.n.srikanth@gmail.com</email>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>Department of Computer Science, University of New Orleans, New Orleans, LA, 70148, USA</p>
				</ins>
				<ins id="I2">
					<p>The Research Institute for Children, 200 Henry Clay Ave., New Orleans, LA 70118, USA</p>
				</ins>
				<ins id="I3">
					<p>Department of Chemistry, University of California &#8211; Santa Cruz, Santa Cruz, CA 90560, USA</p>
				</ins>
			</insg>
			<source>BMC Bioinformatics</source>
			<supplement>
				<title>
					<p>Third Annual MCBIOS Conference. Bioinformatics: A Calculated Discovery</p>
				</title>
				<editor>Jonathan D Wren (Senior Editor), Stephen Winters-Hilt, Yuriy Gusev, Andrey Ptitsyn</editor>
				<note>Proceedings</note>
				<url>http://www.mcbios.org</url>
			</supplement>
			<conference>
				<title>
					<p>Third Annual MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference. Bioinformatics: A Calculated Discovery</p>
				</title>
				<location>Baton Rouge, LA, USA</location>
				<date-range>2&#8211;4 March, 2006</date-range>
			</conference>
			<issn>1471-2105</issn>
			<pubdate>2006</pubdate>
			<volume>7</volume>
			<issue>Suppl 2</issue>
			<fpage>S22</fpage>
			<xrefbib>
				<pubidlist><pubid idtype="pmpid">17118144</pubid><pubid idtype="doi">10.1186/1471-2105-7-S2-S22</pubid>
				</pubidlist></xrefbib>
		</bibl>
		<history>
			<pub>
				<date>
					<day>26</day>
					<month>9</month>
					<year>2006</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2006</year>
			<collab>Winters-Hilt et al; licensee BioMed Central Ltd.</collab>
			<note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
		</cpyrt>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<sec>
					<st>
						<p>Background</p>
					</st>
					<p>Channel current feature extraction methods, using Hidden Markov Models (HMMs) have been designed for tracking <it>individual</it>-molecule conformational changes. This information is derived from observation of changes in ionic channel current blockade "signal" upon that molecule's interaction with (and occlusion of) a <it>single </it>nanometer-scale channel in a "nanopore detector". In effect, a nanopore detector transduces single molecule events into channel current blockades. HMM analysis tools described are used to help systematically explore DNA dinucleotide flexibility, with particular focus on HIV's highly conserved (and highly flexible/reactive) viral DNA termini. One of the most critical stages in HIV's attack is the binding between viral DNA and the retroviral integrase, which is influenced by the dynamic-coupling induced high flexibility of a CA/TG dinucleotide positioned precisely two base-pairs from the blunt terminus of the duplex viral DNA. This suggests the study of a family of such CA/TG dinucleotide molecules via nanopore measurement and cheminformatics analysis.</p>
				</sec>
				<sec>
					<st>
						<p>Results</p>
					</st>
					<p>HMMs are used for level identification on the current blockades, HMM/EM with boosted variance emissions are used for level projection pre-processing, and time-domain FSAs are used to parse the level-projected waveform for kinetic information. The observed state kinetics of the DNA hairpins containing the CA/TG dinucleotide provides clear evidence for HIV's selection of a peculiarly flexible/interactive DNA terminus.</p>
				</sec>
			</sec>
		</abs>
	</fm>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st>
			<sec>
				<st>
					<p>Fundamental hypothesis</p>
				</st>
				<p>HIV DNA is found to have a highly conserved CA dinucleotide step precisely two base-pairs from its blunt-end terminus <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. In preliminary nanopore studies the blockade level lifetimes of the wild-type 3' end sequence (-C-A-T-G-3') were found to be similar to (-C-A-A-A-3'), consistent with their similarities in DNA conformation and &#916;G. This similarly motivated the present study of a small group of nine base-pair stem DNA hairpins consisting of all adenosines on the 3' side of the molecule, except for one cytosine-adenosine step (the "CA-step" set). Contrary to the differences (seemingly) indicated by nature, the calculated &#916;G&#176; of hairpin formation (using mFold) is the same for the CA-step set. It is hypothesized that the highly conserved nature of the HIV DNA terminus corresponds to some beneficial flexibility that increases reactivity with the HIV integrase prior to insertion into the host DNA. A test of the hypothesized flexibility/reactivity is sought via analysis of channel current statistics for signs of notably different blockade kinetics between the blunt-ended HIV DNA conformer and the other blunt-ended hairpins in the CA-step set.</p>
			</sec>
			<sec>
				<st>
					<p>Sequence Dependent DNA Conformation</p>
				</st>
				<p>DNA conformation is dependent upon intrinsic properties of a given sequence and upon the environment in which the molecule is studied <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. Intrinsic sequence-dependent properties include minor groove width <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>, propensity to undergo B-to-A transition <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>, and cation localization in the major vs minor groove <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr></abbrgrp>.</p>
				<p>Sequence-dependent conformation influences nearly all aspects of DNA biology including enzyme-dependent functions such as replication, transcription, and recombination. Here it is important to distinguish between the two general mechanisms by which enzymes recognize DNA <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>: 1) recognition of functional groups on specific bases in the major groove ('direct' readout); and 2) conformation-dependent enzyme recognition of DNA ('indirect' readout). An example of indirect readout is DNA binding by <it>E. coli </it>Integration Host Factor (IHF). This heterodimeric protein binds to DNA in a sequence-specific manner that causes a 160 degree bend. This bend is required for recombination and transcription. Importantly, IHF contacts the phosphate backbone and the minor groove only, therefore its sequence-specificity must be conformation dependent.</p>
				<p>Traditionally, efforts to explain DNA conformation have focused on the propensity of nucleotides to adopt C2' endo vs C3' endo sugar pucker, base stacking, groove hydration, and the preferred geometries of GC vs AT pairs (e.g. propeller twist) <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr></abbrgrp>. An interesting (and controversial) new hypothesis holds that sequence-dependent cation position in the minor or major groove determines DNA conformation <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. In either case, the structural predictions used to formulate and test hypotheses have relied upon angstrom precision measurements by X-ray diffraction analysis of oligonucleotide crystals and heteronuclear NMR spectrosocopy of DNA in solution.</p>
			</sec>
			<sec>
				<st>
					<p>Structural predictions based on X-ray crystallography and NMR spectroscopy</p>
				</st>
				<p>The first X-ray crystal structure of a DNA oligomer (the 'Dickerson dodecamer') was published in 1981 (Drew and Dickerson <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>). It established substantial deviation among base pairs in terms of propeller twist, rise per base pair, and sugar pucker. Numerous attempts have been made to understand the structural basis for these differences. As is true for models used to predict thermodynamic stability of duplexes <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>, models based on dinucleotide steps have been reasonably successful. For example, Hassan used structural data from sixty oligomer crystals to establish features of dinucleotide steps that correlate with DNA flexibility. Pyrimidine-purine dinucleotide steps that are known to be flexible (e. g. TA and CA) were associated with little propeller twist and a variety of slide positions whereas steps that are known to be rigid (notably AA steps) were high in propeller twist and they had a limited range of slide. But others argue that dinucleotide steps are inadequate to describe sequence dependent structure and dynamics because context can strongly influence their behavior. This is illustrated in a study by Packer and Hunter <abbrgrp><abbr bid="B36">36</abbr></abbrgrp> who used a similar crystal structure database to examine the effect of neighboring base pairs on dinucleotide flexibility (as measured by slide and shift). Their results indicate that some dinucleotide steps adopt conformations that are entirely independent of neighboring base pairs (e.g. AA, AT, TA), while others are weakly context dependent (e.g. AC, AG, CA, GA), and still others are strongly context dependent (CG, GC, CC).</p>
				<p>Although crystal structures have provided fundamental information that helps illuminate how DNA can bend and twist when bound to proteins, the approach has limitations. For instance, close packing of DNA in crystals is known to alter structure relative to solution phase, and the cryogenic temperatures used for high resolution may lead to under-representation of conformers that are common at physiological temperatures. NMR spectroscopy can overcome these limitations because the experiments are typically run at 1 mM concentration and ambient temperature. This is illustrated by a recent comprehensive study <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> which compared an NMR structure for the Dickerson dodecamer with a high resolution crystal structure. There were two basic conclusions: 1) The average AATT core structure was very similar for the NMR-based and crystal-based predictions, i.e. strong propeller twist and a narrow minor groove. This is not surprising because the AATT sequence is relatively inflexible <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>, it has been extensively studied <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>, and it is constrained by four base pairs at either end of the dodecamer; 2) by comparison, the predicted structures for the CGCG segments demonstrated a profound variability. The authors attributed this difference to averaging of C3' -endo vs C2' -endo sugar puckering in the NMR structure, particularly among cytosines. At the cryogenic temperatures used for the high resolution crystal structure, the higher energy state C3' -endo pucker would be rarely observed. It is also likely that proximity to the duplex terminus can account for some of the difference because the helix ends overlap in crystals but not in solution <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. Whether structural averaging by NMR or approximation by a crystallized form, particularly near the important DNA terminal regions, neither approach provides a clear picture of the conformational <it>history </it>of a free molecule in solution at physiological temperature, as is described in what follows.</p>
			</sec>
			<sec>
				<st>
					<p>Structure and Dynamics of Duplex Ends</p>
				</st>
				<p>The structure and dynamics of DNA duplex ends can influence numerous enzyme-dependent processes. Some of the most biologically important of these are integration of transposons and retroviral dsDNA into target chromosomes. Two well studied examples are transposition of the phage Mu genome, and integration of HIV dsDNA copies into target chromosomal DNA. In both cases, a consensus CA dinucleotide step at or near the duplex terminus is believed to confer flexibility on the viral DNA that is required for processing and strand transfer.</p>
				<p>DNA duplex ends are significantly under-represented in NMR and crystal structure studies despite their critical importance in biology. For example, Hassan and Calladine's landmark study <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> was based on X-ray crystal structures for 60 oligomers. A&#8226;T pairs appeared only twice in the terminal dinucleotide step of the 120 duplex ends. This under-representation may be due to a historical bias since the Dickerson dodecamer contains only G&#8226;C pairs in the four base pair termini. But it may also be due to recognition of a built in bias in crystal structures because the helix ends are known to overlap <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, and interpretation of their structure is therefore ambiguous. NMR studies of DNA structure have also been biased toward the Dickerson dodecamer and its variants.</p>
			</sec>
			<sec>
				<st>
					<p>Analysis of Individual DNA Hairpin Molecules Using a Protein Pore</p>
				</st>
				<p>The &#945;-hemolysin channel is a protein heptamer, formed by seven identical 33 kD protein molecules secreted by <it>Staphylococcus aureus</it>. The total channel length is 10 nm and is comprised of a 5 nm <it>trans</it>-membrane domain and a 5 nm vestibule that protrudes into the aqueous <it>cis </it>compartment <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. The narrowest segment of the pore is a 1.5 nm-diameter aperture <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>, see Fig. <figr fid="F1">1</figr>. By comparison, a single strand of DNA is about 1.3 nm in diameter. Given that water molecules are 0.15 nm in diameter, this means that one hydration layer separates ssDNA from the amino acids in the limiting aperture. This places the charged phosphodiester backbone, hydrogen bond donors and acceptors, and apolar rings of the DNA bases within one Debye length (3 &#197; in 1 M KCl) of the pore wall (the 1.5 nm limiting aperture is circumscribed by lysine 147). Not surprisingly, ssDNA and ssRNA strongly interact with the &#945;-hemolysin channel during translocation. Although dsDNA is too large to translocate, about ten base-pairs at one end can still be drawn into the large cis-side vestibule. This actually permits the most sensitive experiments to date, as the ends of "captured" dsDNA molecules can be observed for long periods to resolve features <abbrgrp><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr><abbr bid="B43">43</abbr><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr></abbrgrp>. In 1.0 M KCl (pH 8.0), a 120 mV applied potential produces a steady open channel current (I<sub>o</sub>) of 120 &#177; 5 pA at 23&#176;C (a 1G Ohm resistor). Translocation of single-stranded linear DNA (Figure <figr fid="F1">1</figr>) reduces this current to I &#8773; 14 pA (I/I<sub>o </sub>= 12%). Each monomer within single stranded DNA traverses the length of the 10-nm pore in 1 to 3 &#956;s at ambient temperature.</p>
				<fig id="F1">
					<title>
						<p>Figure 1</p>
					</title>
					<caption>
						<p>A nanopore device based on the &#945;-hemolysin channel (from [41])</p>
					</caption>
					<text>
						<p>A nanopore device based on the &#945;-hemolysin channel (from [41]). a) Diagram of a horizontal bilayer apparatus used in the UNO-RIC laboratory. One &#945;-hemolysin channel is intercalated in a horizontal bilayer. The bilayer is supported on a 25-micron-diameter conical aperture at the end of a U-shaped Teflon tube. The tube connects two 70 &#956;l volume baths filled with 1 M KCl buffered at pH 8.0. b) Two-dimensional diagram of a 9 bp hairpin captured in the pore vestibule. The stick figure in blue is a two dimensional section of the &#945;-hemolysin pore derived from X-ray crystallographic data. c) Representative blockade of ionic current caused by a 9 bp DNA hairpin (9 bpC&#8226;G). Open channel current (I<sub>o</sub>) is typically 120 pA at 120 mV and 23.0&#176;C. In the case of 9 bp hairpins, the residual current transitions between four levels.</p>
					</text>
					<graphic file="1471-2105-7-S2-S22-1"/>
				</fig>
				<p>The initial DNA hairpin experiments <abbrgrp><abbr bid="B45">45</abbr></abbrgrp> involved a well-characterized single-conformer DNA hairpin with a six-base-pair stem and a four-deoxythymidine loop <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. AMBER field <abbrgrp><abbr bid="B47">47</abbr></abbrgrp> molecular dynamics simulation indicated that the four-deoxythymidine loop would adopt conformations that would prevent passage through the <it>cis</it>-vestibule entry and this was also verified by studying hairpin molecules with 4-dT loops at both ends (see <abbrgrp><abbr bid="B45">45</abbr></abbrgrp> for details). When captured within an &#945;-hemolysin nanopore (with only one capture orientation or one "nanopore epitope"), the six base-pair DNA hairpin molecule caused a partial current blockade (or 'shoulder') lasting hundreds of milliseconds followed by a rapid downward spike (lasting hundreds of <it>micro</it>seconds). This "shoulder-spike" signature is consistent with two sequential steps: i) capture of a hairpin stem in the vestibule, where the molecule rattles in place because the hairpin loop cannot fit through the 2.6 nm aperture at the vestibule opening (and the duplex stem cannot fit through the 1.5-nm diameter-limiting aperture of the pore); and ii) simultaneous dissociation of the six base pairs in the hairpin stem, thus allowing the extended single-strand to traverse the channel. Building from the six base-pair stem, each base pair addition resulted in a measurable increase in blockade shoulder lifetime that correlated with the calculated &#916;G&#176; of hairpin formation (Figure <figr fid="F2">2</figr>) <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>. A downward trend in shoulder current amplitude was also observed from I/I<sub>o </sub>equal to 68% for a 3 bp stem to I/I<sub>o </sub>equal to 32% for a 9 bp stem. These results are consistent with greater obstruction of ionic current as the hairpin stem extends further into the vestibule with each additional base pair.</p>
				<fig id="F2">
					<title>
						<p>Figure 2</p>
					</title>
					<caption>
						<p>Influence of hairpin stem length on current impedance (from [45])</p>
					</caption>
					<text>
						<p>Influence of hairpin stem length on current impedance (from [45]). In the plot at left, each point represents the amplitude and duration for translocation of one DNA hairpin molecule. The duplex stems ranged from 3 bp to 8 bp. In the plot at right, average blockade durations are plotted as a function of duplex hairpin stability in kcal mol calculated using 'Mfold'. ''6bpA14' is a 6 bp hairpin with an A&#8226;A mismatch.</p>
					</text>
					<graphic file="1471-2105-7-S2-S22-2"/>
				</fig>
			</sec>
			<sec>
				<st>
					<p>A New Method for Single Molecule Detection and Characterization</p>
				</st>
				<p>Channel current based nanopore cheminformatics provides an incredibly versatile method for transducing single molecule events into channel current blockade states (see Figure <figr fid="F1">1</figr>). Single biomolecules and the ends of biopolymers such as DNA have been examined in solution with nanometer-scale precision <abbrgrp><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr><abbr bid="B43">43</abbr><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr></abbrgrp>. In work described above <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>, it was found that complete base-pair dissociations of dsDNA to ssDNA, "melting", could be observed for sufficiently short DNA hairpins. In later work <abbrgrp><abbr bid="B42">42</abbr><abbr bid="B44">44</abbr></abbrgrp>, the nanopore detector was used to "read" the ends of dsDNA molecules, and was operated as a chemical mixture tester. In recent work <abbrgrp><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr><abbr bid="B43">43</abbr></abbrgrp>, the nanopore detector has been used to observe the conformational kinetics at the termini of single DNA molecules. And in the most recent work, reported here, the nanopore is used to measure conformational kinetics of a family of DNA molecules consisting of variations of the HIV DNA consensus terminus.</p>
			</sec>
			<sec>
				<st>
					<p>The channel current cheminformatics architecture</p>
				</st>
				<p>Figure <figr fid="F3">3</figr> shows the signal processing architecture that is used. The prototype architecture and preliminary modifications are described in detail in <abbrgrp><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr><abbr bid="B43">43</abbr></abbrgrp>. Recent additions to the software, and their application, are described. The processing is designed to rapidly extract useful information from noisy blockade signals using feature extraction protocols, wavelet analysis, Hidden Markov Models (HMMs) and Support Vector Machines (SVMs). A Finite State Automaton (FSA) <abbrgrp><abbr bid="B48">48</abbr></abbrgrp> approach is used for blockade signal acquisition and simple, time-domain, feature-extraction. The FSA is based on variety of threshold parameters, the tuning of which is very minimal (one round of parameter tuning sufficed for the acquisition of all the different types of channel blockade described here). The utility of a time-domain approach at the front-end of the signal analysis is that it permits precision control of the acquisition as well as extraction of fast time-scale signal characteristics. A generic HMM <abbrgrp><abbr bid="B42">42</abbr></abbrgrp> is then used to characterize current blockades by identifying a sequence of sub-blockades as a sequence of state emissions <abbrgrp><abbr bid="B49">49</abbr><abbr bid="B50">50</abbr><abbr bid="B51">51</abbr></abbrgrp>. The parameters of the generic-HMM can then be estimated using a method called Expectation/Maximization (or just "EM") <abbrgrp><abbr bid="B52">52</abbr></abbrgrp> to effect de-noising.</p>
				<fig id="F3">
					<title>
						<p>Figure 3</p>
					</title>
					<caption>
						<p>The signal acquisition was performed using a time-domain Finite State Automaton (FSA)</p>
					</caption>
					<text>
						<p>The signal acquisition was performed using a time-domain Finite State Automaton (FSA). This was followed by adaptive pre-filtering using a wavelet-domain FSA. Feature extraction on those acquired channel blockades was done by Hidden Markov Model (HMM) processing; and classification was done by Support Vector Machine (SVM). The optimal SVM architecture is shown for classification of five DNA hairpin molecules labeled 9CG, 9GC, 9TA, 9AT, and 8GC (the number denotes the stem length in base-pairs and the two-base entry denotes the 5'-3' termini). The linear tree multi-class SVM architecture benefits from strong signal skimming and weak signal rejection along the line of decision nodes. Scalability to larger multi-class problems is possible since the main on-line computational cost is at the HMM feature extraction stage. The accuracy shown is for single-species mixture identification upon completing the 15<sup>th </sup>single molecule sampling/classification (in approx. 6 seconds).</p>
					</text>
					<graphic file="1471-2105-7-S2-S22-3"/>
				</fig>
				<p>Classification of feature vectors obtained by the HMM (for each individual blockade event) is then done using SVMs, an approach which automatically provides a confidence measure on each classification (see Figure <figr fid="F4">4</figr>). SVMs are fast, easily trained discriminators <abbrgrp><abbr bid="B53">53</abbr><abbr bid="B54">54</abbr></abbrgrp> for which strong discrimination is possible without the over-fitting complications common to neural net discriminators <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>. In <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>, novel information-theoretic kernels were introduced for notably better performance over standard kernels (with discrete probability distributions as part of feature vector data).</p>
				<fig id="F4">
					<title>
						<p>Figure 4</p>
					</title>
					<caption>
						<p>A sketch of the hyperplane separability heuristic for SVM binary classification</p>
					</caption>
					<text>
						<p>A sketch of the hyperplane separability heuristic for SVM binary classification. An SVM is trained to find an optimal hyperplane that separates positive and negative instances, while also constrained by structural risk minimization (SRM) criteria, which here manifests as the hyperplane having a thickness, or "margin," that is made as large as possible in seeking a separating hyperplane. A benefit of using SRM is much less complication due to overfitting (a common problem with Neural Network discrimination approaches).</p>
					</text>
					<graphic file="1471-2105-7-S2-S22-4"/>
				</fig>
				<p>The classification approach adopted in <abbrgrp><abbr bid="B42">42</abbr></abbrgrp> is designed to scale well to multi-species classification (or a few species in a very noisy environment). The scaling is possible due to use of a decision tree architecture and an SVM approach that permits rejection on weak data. SVMs are usually implemented as binary classifiers but may be grouped in a decision tree to arrive at a Multi-class discriminator. SVMs are much less susceptible to over-training than neural nets <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>. This allows for a much more hands-off training process and provides a more stable classifier.</p>
				<p>A multiclass implementation for an SVM is also possible &#8211; where multiple hyperplanes are optimized simultaneously. A (single-optimization, multi-hyperplane) multiclass SVM has a much more complicated implementation, but the reward is a classifier that is much easier to tune and train, especially when considering data rejection. The (single) multiclass SVM, doesn't have as non-scalable a throughput problem (with tree depth), and even appears to offer a natural drop zone via its margin definition. therefore it is being considered in further refinements of the method (see <abbrgrp><abbr bid="B55">55</abbr></abbrgrp> in this same issue for recent applications of these refinements to other channel current data).</p>
				<p>The SVM discriminators are trained by the Sequential Minimal Optimization (SMO) procedure <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>. A chunking <abbrgrp><abbr bid="B57">57</abbr><abbr bid="B58">58</abbr></abbrgrp> variant of SMO also is employed to manage the large training task at each SVM node. The multi-class SVM training generally involves thousands of blockade signatures for each signal class.</p>
				<p>Different tools are employed at each stage of the signal analysis (as shown in Figure <figr fid="F3">3</figr>) in order to realize the robust (and noise resistant) tools for knowledge discovery, information extraction, and classification <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>. Statistical methods for signal rejection using SVMs are also employed in order to reject extremely noisy signals.</p>
			</sec>
			<sec>
				<st>
					<p>Role of DNA Conformation in HIV DNA Terminus Flexibility/Reactivity</p>
				</st>
				<p>DNA conformation plays a very important role in protein-DNA complex formation <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. In this process two of the crucial factors are the environment in which the complex is formed and the properties of the specific sequence interacting with the protein or other DNA molecule <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. Despite the multitude of crystallographic studies <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B32">32</abbr><abbr bid="B59">59</abbr></abbrgrp> conducted on DNA, it is still difficult to translate the sequence-directed curvature information obtained through these tools to actual systems found in solution. Information on the DNA molecule's variation in structure and flexibility is important, however, to understanding the dynamically enhanced (naturally selected) DNA complex formations that are found with strong affinities to other, specific, DNA and protein molecules. Crystallographic and NMR studies alone can't give a perspective about the dynamics of these molecules in environments with similar physiological conditions.</p>
			</sec>
			<sec>
				<st>
					<p>Conformational kinetics of the HIV DNA termini</p>
				</st>
				<p>An important example of DNA conformational flexibility is the HIV attack on T-cells. In the retroviral attack of HIV one of the most critical stages is the integration process of viral DNA into the host DNA <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. The viral DNA sequence critical to the attachment and insertion of viral DNA into the host DNA is found at the terminus of the blunt-ended viral DNA <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>. The integration process is influenced by the dynamic-coupling induced by the high flexibility of a CA/TG dinucleotide positioned precisely two base-pairs from the blunt terminus of the duplex viral DNA <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. The CA/TG dinucleotide presence is a universal characteristic of retroviral genomes. Deletion of these base pairs impedes the integration process <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> and it is believed that the unusual flexibility imparted by this base-pair on the terminus geometry is necessary for the binding to integrase. Once bound to integrase the viral DNA molecule is modified by removal of the two residues at the 3'-end together with subsequent insertion into the host genome. Our hypothesis is that the DNA hairpin with a CA/TG dinucleotide positioned two base-pairs from the blunt terminus will have channel current statistics differentiable from the other DNA hairpins.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Results</p>
			</st>
			<p>In what follows kinetic feature extraction is done on two types of channel current blockade events: (i) fixed level blockades, and (ii) blockade "spikes" (anomalous deflections from a specified level). The spike detection, and thus spike frequency, algorithm is FSA-based. The blockade level lifetime analysis is primarily HMM-based, where HMM/EM with boosted variance emissions is used for level projection pre-processing, and time-domain FSAs are used to parse the level-projected waveform for kinetic information. This provides a robust kinetic feature extraction formalism with a minimal amount of FSA-level tuning. Application of the spike detection tool permits strong discrimination capability not otherwise possible between DNA molecules with and without minor radiation damage. Application of the HMM kinetic feature extraction tool permits statistical differences to be discernible between molecules in the study of HIV DNA (described in what follows). The rich set of kinetic features obtained allows for DNA terminus classification/clustering. An SVM-based clustering method has been developed and was applied to the control molecules to test this capability. A Web-interface to the various software tools used is also described.</p>
			<sec>
				<st>
					<p>&#964;-FSA Blockade Acquisition and time-domain Feature Extraction</p>
				</st>
				<p>A Channel Current Spike Detector algorithm has been developed to characterize the blockade "spike" behavior observed for molecules when they strongly occlude the pore. Together, the formulation of HMM-EM, FSAs and Spike Detector provide a robust method for analysis of channel current data. Application of these methods is shown (Figure <figr fid="F5">5</figr>) for radiation damaged DNA signals obtained by Dr. Wenonah Vercoutere at NASA-Ames. In the radiated DNA study the "spike" feature, seen as the anomalously deep blockades of channel current from the LL blockade state, is used to successfully differentiate between radiated and non-radiated DNA molecules.</p>
				<fig id="F5">
					<title>
						<p>Figure 5</p>
					</title>
					<caption>
						<p>Panel (A) shows a 100 ms blockade trace with one blockade "spike" event, and the signal analysis that results from analysis of hundred of seconds of blockade data from the same species of molecule</p>
					</caption>
					<text>
						<p>Panel (A) shows a 100 ms blockade trace with one blockade "spike" event, and the signal analysis that results from analysis of hundred of seconds of blockade data from the same species of molecule. The molecule studied in (A) is 9 base-pair hairpin that is the radiation damaged DNA model (a terminal guanine is oxolated) of the molecule studied in (B), with terminal guanine unaltered in the "non-radiated" molecule. The spike count plots show increasing counts as spike cut-off thresholds are relaxed (to where eventually any downward deflection will be counted as a spike). Plots are automatically generated using gnuplot and automatically fit with extrapolations of their linear phases at the group's tools website. The extrapolations provide an estimate of "true" anomalous spike counts &#8211; counts associated with terminus fraying in the captured DNA hairpin (as shown in [44]). The radiated form of the molecule frayed 17.6 times on average (while in the LL state), while the non-radiated molecule only frayed 3.58 times a second, on average.</p>
					</text>
					<graphic file="1471-2105-7-S2-S22-5"/>
				</fig>
				<p>The spike detector software is designed to count "anomalous" spikes, i.e., spike noise not attributable to the gaussian fluctuations about the mean of the dominant blockade-level. Spike count plots are generated to show increasing counts as cut-off thresholds are relaxed (to where eventually any downward deflection will be counted as a spike). The plots are automatically generated and automatically fit with extrapolations of their linear phases (at the group's CCCool-tools website). The extrapolations provide an estimate of "true" anomalous spike counts &#8211; counts associated with terminus fraying in the captured DNA hairpin (via mechanism discussed in <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>). For the study above, the radiated form of the molecule frayed 17.6 times a second, on average, while in the LL state. The non-radiated molecule only frayed 3.58 times a second, on average, from the LL state (see Figure <figr fid="F5">5</figr>). This result is consistent with the weakened hydrogen bonding at the terminus of the radiation-damaged molecule.</p>
			</sec>
			<sec>
				<st>
					<p>EVA Projection</p>
				</st>
				<p>The HMM method is based on a stationary set of emission and transition probabilities. Emission broadening via amplification of the emission state variances is a filtering heuristic that leads to level-projection that strongly preserves transition times between major levels (see Discussion for details). Results from the emission variance amplification (EVA) emission broadening method are shown in Figure <figr fid="F6">6</figr> (with varying amounts of variance amplification). This approach does not require the user to define the number of levels (classes). This is a major advantage compared to existing tools that require the user to determine the levels (classes) and perform a state projection. This allows kinetic features to be extracted with a "simple" FSA that requires minimal tuning (see Figure <figr fid="F7">7</figr> for kinetic features results and Figure <figr fid="F8">8</figr> for the signal processing architecture).</p>
				<fig id="F6">
					<title>
						<p>Figure 6</p>
					</title>
					<caption>
						<p>The HMM/EM EVA projection method, for kinetic feature extraction, does not require the user to define the number of levels (classes)</p>
					</caption>
					<text>
						<p>The HMM/EM EVA projection method, for kinetic feature extraction, does not require the user to define the number of levels (classes). This is a major advantage compared to existing tools which require the user to determine the levels (classes) and perform a state projection. At a later stage, this allows kinetic features to be extracted with a "simple" FSA that requires <it>minimal </it>tuning.</p>
					</text>
					<graphic file="1471-2105-7-S2-S22-6"/>
				</fig>
				<fig id="F7">
					<title>
						<p>Figure 7</p>
					</title>
					<caption>
						<p>a. In preliminary nanopore studies the wild-type 3' end sequence (-C-A-T-G-3') was found to be similar to (-C-A-A-A-3'), which motivated the present study of a group of DNA hairpins consisting of all adenosines on the 3' side of the molecule, except for one cytosine-adenosine step</p>
					</caption>
					<text>
						<p><b>a</b>. In preliminary nanopore studies the wild-type 3' end sequence (-C-A-T-G-3') was found to be similar to (-C-A-A-A-3'), which motivated the present study of a group of DNA hairpins consisting of all adenosines on the 3' side of the molecule, except for one cytosine-adenosine step. Contrary to the differences (seemingly) indicated by nature, the calculated &#916;G&#176; of hairpin formation (using mFold) is the same for the set of molecules described, with one CA step (the CA set).<b>b</b>. UL, the unbound terminus state, has shortest life for CA_3, i.e., CA_3 has strongest interaction with channel (and surroundings), neighboring variants (CA_2, CA_4) share this property to a lesser extent, and molecules with GC pairs more than 1 base-pair distant group very closely, the one molecule with no extra GC also separates with its own characteristic curve. This result is consistent with the increased reactivity of CA_3 to initiate complex formation [1], with weaker variants in CA_2 and CA_4, exactly as found experimentally [1-7].</p>
					</text>
					<graphic file="1471-2105-7-S2-S22-7"/>
				</fig>
				<fig id="F8">
					<title>
						<p>Figure 8</p>
					</title>
					<caption>
						<p>The experimental architecture, with a focus on the signal processing components, is shown with modifications upon with the addition of Feature Extraction Stage II for the HMM/EM-EVA kinetic feature extraction</p>
					</caption>
					<text>
						<p>The experimental architecture, with a focus on the signal processing components, is shown with modifications upon with the addition of Feature Extraction Stage II for the HMM/EM-EVA kinetic feature extraction. Use of this information at the kinetic information analyzer stage has been completed (as shown in the results in Fig. 7b). Incorporation of this information into the feature vectors packaged for online SVM classification, however, has not been completed (thus the linkage with notation on work-in-progress).</p>
					</text>
					<graphic file="1471-2105-7-S2-S22-8"/>
				</fig>
			</sec>
			<sec>
				<st>
					<p>Cheminformatics analysis of DNA conformational kinetics</p>
				</st>
				<p>It was hypothesized that the highly conserved nature of the HIV DNA terminus corresponds to some beneficial flexibility and thus reactivity with HIV integrase prior to insertion into the host DNA, and that this might lead to some statistically discernable difference in their channel blockade statistics. A test of the hypothesized flexibility/reactivity was performed on the set of DNA hairpins with a single CA dinucleotide step. Analysis of channel current statistics (Fig. <figr fid="F7">7b</figr>) shows that the blunt-ended HIV DNA conformer has notably different blockade kinetics than the other blunt-ended hairpins in the CA set (see Fig. <figr fid="F7">7a</figr>).</p>
			</sec>
			<sec>
				<st>
					<p>SVM Clustering</p>
				</st>
				<p>Clustering will be necessary when the number of molecular classes under consideration grows too large (such as conformational studies encompassing the last 4 base-pairs: which comprise 4<sup>4 </sup>= 256 classes). Preliminary efforts to implement an external-SVM clustering algorithm have begun. The prototype clustering approach clusters data vectors with no <it>a priori </it>knowledge of each vector's class or number of classes. The algorithm works by first running a Binary SVM against a data set, with each vector in the set randomly labeled, until the SVM converges (see Figure <figr fid="F9">9</figr> for more details). With sub-cluster identification upon iterating the overall algorithm on the positive and negative clusters (until the clusters are no longer separable into sub-clusters), this method provides a way to cluster data sets without prior knowledge of the data's clustering characteristics, or the number of clusters. Figure <figr fid="F10">10</figr> and Figure <figr fid="F11">11</figr> show clustering runs on a data set with a mixture of the 8GC and 9GC control molecules (described in the Methods). The test set consists of 400 elements (200 in each class). The SVM uses a Gaussian Kernel and allows 3% mislabeled data for convergence. See <abbrgrp><abbr bid="B54">54</abbr></abbrgrp> for further details and the latest work along these lines.</p>
				<fig id="F9">
					<title>
						<p>Figure 9</p>
					</title>
					<caption>
						<p>Shown is the schematic for an "external" SVM clustering algorithm</p>
					</caption>
					<text>
						<p>Shown is the schematic for an "external" SVM clustering algorithm.</p>
					</text>
					<graphic file="1471-2105-7-S2-S22-9"/>
				</fig>
				<fig id="F10">
					<title>
						<p>Figure 10</p>
					</title>
					<caption>
						<p>Clustering performance for various Gaussian kernel tuning parameters &#8211; with averages of the five test-runs used as representative curves in the graph</p>
					</caption>
					<text>
						<p>Clustering performance for various Gaussian kernel tuning parameters &#8211; with averages of the five test-runs used as representative curves in the graph.</p>
					</text>
					<graphic file="1471-2105-7-S2-S22-10"/>
				</fig>
				<fig id="F11">
					<title>
						<p>Figure 11</p>
					</title>
					<caption>
						<p>Efforts are underway to slowly relax the restriction on number of mislabeled data points tolerated at each iteration of the external clustering algorithm, such that the convergence (clustering) process can be accelerated</p>
					</caption>
					<text>
						<p>Efforts are underway to slowly relax the restriction on number of mislabeled data points tolerated at each iteration of the external clustering algorithm, such that the convergence (clustering) process can be accelerated. Here, mislabeled data points are taken to be instances where one of the Karush-Kuhn-Tucker (KKT) conditions for a properly labeled data point is violated (a KKT violator). A slow tightening in a parameter, sometimes in a dampened oscillatory manner, is an annealing process. As shown, zero KKT violator annealing is used to approximately halve the clustering time needed.</p>
					</text>
					<graphic file="1471-2105-7-S2-S22-11"/>
				</fig>
			</sec>
			<sec>
				<st>
					<p>The unoSVM and CCCool Tools interfaces</p>
				</st>
				<p>Web-accessible machine-learning tools have been developed for general pattern recognition tasks, with specific application to channel current analysis, DNA biophysical analysis and computational genomics. The core machine learning tools are primarily based on support vector machine (SVM) algorithms, hidden Markov model (HMM) algorithms, and finite state automata (FSAs). Some of the Machine Learning web pages provide expert interfaces to the machine learning tools (all model parameters accessible). This includes SVM web interfaces with a number of algorithm and kernel variants, and classification and clustering applications. The interface to this and all other software described is available via the group Home Page: <url>http://logos.cs.uno.edu/~nano/</url> (see Figure <figr fid="F12">12</figr>).</p>
				<fig id="F12">
					<title>
						<p>Figure 12</p>
					</title>
					<caption>
						<p>Several channel current cheminformatics tools are available for use via web interfaces at <url>http://logos.cs.uno.edu/~nano/</url></p>
					</caption>
					<text>
						<p>Several channel current cheminformatics tools are available for use via web interfaces at <url>http://logos.cs.uno.edu/~nano/</url>. These tools include a variety of SVM interfaces for classification and clustering (binary and multiclass), and HMM tools for feature extraction and structure identification (with applications to both channel current cheminformatics and computational genomics).</p>
					</text>
					<graphic file="1471-2105-7-S2-S22-12"/>
				</fig>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Discussion</p>
			</st>
			<sec>
				<st>
					<p>Emission Variance Amplification (EVA) Projection</p>
				</st>
				<p>It is hypothesized that emission variance amplification (EVA) in a non-uniformly increasing transition probability region leads to Viterbi path migration with each EM/EVA iteration towards the dominant levels (regions of high occupation probability), while strongly preserving the transition times of level changes. The migration of fluctuations is disrupted (and the method fails) if pre-processing is done with a low-pass filter (using an N-sample moving average, for example, with N = 8). This may provide a method for automatically tuning the low-pass filter &#8211; by narrowing the pass band until the projection method fails and tuning accordingly. This offers the prospect of fewer tuning subtleties than the emergent-structure tuning, via wavelet FSA, that is currently used.</p>
			</sec>
			<sec>
				<st>
					<p>HMM-with-duration Viterbi Implementation</p>
				</st>
				<p>HMM-with-duration directly incorporates sub-blockade duration probabilities and provides a strong link to the underlying kinetic (physical) information. It is parameterized by the internal HMM signal representation (the emission and transition probabilities, and the duration distributions on state lifetimes), and can be efficiently and safely implemented (see <abbrgrp><abbr bid="B60">60</abbr></abbrgrp> in this issue for further details). By incorporating HMM-with-duration, feature extraction will be more robust on long-lifetime states.</p>
			</sec>
			<sec>
				<st>
					<p>The Machine Learning Software Interface Project</p>
				</st>
				<p>The high volume and complexity of typical, noisy bioinformatics and cheminformatics (real-world) data motivates the use of sophisticated, yet highly efficient machine learning programs. The group website at <url>http://logos.cs.uno.edu/~nano/</url> provides interfaces to: (i) several binary SVM variants (with novel kernel selections and heuristics); (ii) a multiclass (internal) SVM; (iii) an SVM-based Clustering tool; (iv) an FSA-based nanopore spike detector; (v) an HMM-parameter channel current feature extraction tool; and (vi) a kinetic feature extraction tool (via channel current sub-level lifetimes). The website is designed using HTML and CGI scripts that are executed to process the data sent when a form filled in by the user is received at the web server &#8211; results are then e-mailed to the address indicated by the user.</p>
			</sec>
			<sec>
				<st>
					<p>SVM Kernel Selection</p>
				</st>
				<p>Given its geometric expression, it is not surprising that a key construct in the SVM formulation (via the choice of kernel) is the notion of "nearness" between instances or nearness to the hyperplane, where it gives a measure of confidence in the classification, i.e., instances further from the decision hyperplane are called with greater confidence (see Figure <figr fid="F4">4</figr>). Most notions of nearness explored in this context have stayed with the geometric paradigm and are known as "distance kernels." One example being the familiar Gaussian kernel which is based on the Euclidean distance: K<sub>Gaussian</sub>(x,y) = exp(-D<sub>Eucl.</sub>(x,y)<sup>2</sup>/2&#963;<sup>2</sup>), where D<sub>Eucl.</sub>(x,y) = [&#931;<sub>k</sub>(x<sub>k</sub>-y<sub>k</sub>)2]1/2 is the usual Euclidean distance. Those kernels are used in the signal pattern recognition analysis in Figure <figr fid="F3">3</figr> along with a new class of kernels, "divergence kernels," based on a notion of nearness appropriate when comparing probability distributions (or probability feature vectors). The main example of this is the Entropic Divergence Kernel: K<sub>Entropic </sub>= exp(-D<sub>Entropic.</sub>(x,y)<sup>2</sup>/2&#963;<sup>2</sup>), where D<sub>Entropic.</sub>(x,y) = D(x||y)+D(y||x) and D(..||..) is the Kullback-Leibler Divergence (or relative entropy) between x and y.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Conclusion</p>
			</st>
			<p>HMM kinetic feature extraction methods have been developed. Application of the channel current cheminformatics tools to a set of DNA hairpins with single CA-dinucleotide steps clearly reveals the peculiar flexibility and interactivity of the HIV DNA consensus terminus.</p>
		</sec>
		<sec>
			<st>
				<p>Methods</p>
			</st>
			<sec>
				<st>
					<p>Nanopore Experiments</p>
				</st>
				<p>Each experiment is conducted using one &#945;-hemolysin channel inserted into a diphytanoyl-phosphatidylcholine/hexadecane bilayer across a 25-micron-diameter horizontal Teflon aperture, as described previously <abbrgrp><abbr bid="B61">61</abbr></abbrgrp>. Seventy microliter chambers on either side of the bilayer contains 1.0 M KCl buffered at pH 8.0 (10 mM HEPES/KOH) except in the case of buffer experiments where the salt concentration, pH, or identity may be varied. Voltage is applied across the bilayer between Ag-AgCl electrodes. DNA control probes are added to the <it>cis </it>chamber at 10 or 20 &#956;M final concentration. All experiments are maintained at room temperature (23 &#177; 0.1&#176;C), using a Peltier device.</p>
			</sec>
			<sec>
				<st>
					<p>Control probe design</p>
				</st>
				<p>Since the five DNA hairpins studied in the prototype experiment have been carefully characterized, they are used in further experiments as highly sensitive controls. The nine base-pair hairpin molecules examined in the prototype experiment share an eight base-pair hairpin core sequence, with addition of one of the four permutations of Watson-Crick base-pairs that may exist at the blunt end terminus, i.e., 5'-G&#8226;C-3', 5'-C&#8226;G-3', 5'-T&#8226;A-3', and 5'-A&#8226;T-3'. Denoted 9GC, 9CG, 9TA, and 9AT, respectively. The full sequence for the 9CG hairpin is 5' <ul>CTTCGAACG</ul>TTTT<ul>CGTTCGAAG</ul> 3', where the base-pairing region is underlined. The eight base-pair DNA hairpin is identical to the core nine base-pair subsequence, except the terminal base-pair is 5'-G&#8226;C-3'. The prediction that each hairpin would adopt one base-paired structure was tested and confirmed using the DNA mfold server <url>http://bioinfo.math.rpi.edu/~mfold/dna/form1.cgi</url><abbrgrp><abbr bid="B47">47</abbr></abbrgrp>, which is based in part on data from <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>.</p>
			</sec>
			<sec>
				<st>
					<p>DNA hairpin design</p>
				</st>
				<p>Seven DNA molecules were designed to contain a CA/TG dinucleotide at different positions along the DNA stem (labeled CA_0 &#8211; CA_6). In the control molecule the stem did not contain this base-pair, ignoring the CA at the loop terminus, and based on crystallographic predictions the stem was designed to be very rigid <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. The DNA molecules used for the experiments were designed with the aid of the M-fold program <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>. Single stranded DNA (ssDNA) molecules were obtained from IDTDNA as powders, resuspended in TE buffer at a 10 mM concentration and stored at 4&#176;C. The dsDNA molecules were obtained by annealing the resuspended ssDNA molecules at the required temperatures <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> and then were stored at the same temperature as the ssDNA molecules for further usage. The following ssDNA molecules were used to obtain the dsDNA hairpin structures:</p>
				<p>CA_0 5'-<ul>TTTTTTTTG</ul>TTTT<ul>CAAAAAAAA</ul> - 3'</p>
				<p>CA_1 5'-<ul>TGTTTTTTG</ul>TTTT<ul>CAAAAAACA</ul> - 3'</p>
				<p>CA_2 5'-<ul>TTGTTTTTG</ul>TTTT<ul>CAAAAACAA</ul> - 3'</p>
				<p>CA_3 5'-<ul>TTTGTTTTG</ul>TTTT<ul>CAAAACAAA</ul> - 3'</p>
				<p>CA_4 5'-<ul>TTTTGTTTG</ul>TTTT<ul>CAAACAAAA</ul> - 3'</p>
				<p>CA_5 5'-<ul>TTTTTGTTG</ul>TTTT<ul>CAACAAAAA</ul> - 3'</p>
				<p>CA_6 5'-<ul>TTTTTTGTG</ul>TTTT<ul>CACAAAAAA</ul> - 3'</p>
			</sec>
			<sec>
				<st>
					<p>Data acquisition</p>
				</st>
				<p>Data is acquired and processed in two ways depending on the experimental objectives. The first method uses commercial software from Axon Instruments (Redwood City, CA) to acquire data, where current will typically be filtered at 50 kHz bandwidth using an analog low pass Bessel filter and recorded at 20 &#956;s intervals using an Axopatch 200B amplifier (Axon Instruments, Foster City, CA) coupled to an Axon Digidata 1200 digitizer. Applied potential is 120 mV (<it>trans </it>side positive) unless otherwise noted. In some experiments, semi-automated analysis of transition level blockades, current, and duration are performed using Clampex (Axon Instruments, Foster City, CA). The second method uses LabView-based experimental automation. In this case, ionic current is also acquired using an Axopatch 200B patch clamp amplifier (Axon Instruments, Foster City, CA), but it is then recorded using a NI-MIO-16E-4 National Instruments data acquisition card (National Instruments, Austin TX). In the LabView format, data is low-pass filtered by the amplifier unit at 50 kHz, and recorded at 20 &#956;s intervals. In both fixed duty cycle (i.e., not feedback controlled) data acquisition approaches, the solution sampling protocol uses periodic reversal of the applied potential to accomplish the capture and ejection of single biomolecules. The biomolecules captured are typically added to the cis chamber in 20 &#956;M concentrations. The time-domain finite state automaton (FSA, <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>) used in the prototype is used to perform the generic signal identification/acquisition for the first 100 msec of blockade signal (Acquisition Stage, Figure <figr fid="F8">8</figr>). The effective duty cycle for acquiring 100 ms blockade measurements, when found to be sufficient for classification purposes, is adjusted to approximately one reading every 0.4 seconds by choice of analyte concentration. Further details on the voltage toggling protocol and the time-domain FSA are in <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>.</p>
			</sec>
			<sec>
				<st>
					<p>Channel Current Signal Analysis &amp; Pattern Recognition</p>
				</st>
				<sec>
					<st>
						<p>Signal Preprocessing Details</p>
					</st>
					<p>Each 100 ms signal acquired by the time-domain FSA consists of a sequence of 5000 sub-blockade levels (with the 20 &#956;s analog-to-digital sampling). Signal preprocessing is then used for adaptive low-pass filtering. For the data sets examined, the preprocessing is expected to permit compression on the sample sequence from 5000 to 625 samples (later HMM processing then only required construction of a dynamic programming table with 625 columns). The signal preprocessing makes use of an off-line wavelet stationarity analysis (Off-line Wavelet Stationarity Analysis, Figure <figr fid="F8">8</figr>, also see <abbrgrp><abbr bid="B62">62</abbr></abbrgrp>).</p>
				</sec>
				<sec>
					<st>
						<p>HMMs and Supervised Feature Extraction Details</p>
					</st>
					<p>With completion of preprocessing, an HMM <abbrgrp><abbr bid="B52">52</abbr></abbrgrp> is used to remove noise from the acquired signals, and to extract features from them (Feature Extraction Stage, Figure <figr fid="F8">8</figr>). The HMM is, initially, implemented with fifty states, corresponding to current blockades in 1% increments ranging from 20% residual current to 69% residual current. The HMM states, numbered 0 to 49, corresponded to the 50 different current blockade levels in the sequences that are processed. The state emission parameters of the HMM are initially set so that the state j, 0 &lt;= j &lt;= 49 corresponding to level L = j+20, can emit all possible levels, with the probability distribution over emitted levels set to a discretized Gaussian with mean L and unit variance. All transitions between states are possible, and initially are equally likely. Each blockade signature is de-noised by 5 rounds of Expectation-Maximization (EM) training on the parameters of the HMM. After the EM iterations, 150 parameters are extracted from the HMM. The 150 feature vector components are extracted from parameterized emission probabilities, a compressed representation of transition probabilities, and use of <it>a posteriori </it>information deriving from the Viterbi path solution (further details in <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>). This information elucidates the blockade levels (states) characteristic of a given molecule, and the occupation probabilities for those levels, but doesn't directly provide kinetic information. The resulting parameter vector, normalized such that vector components sum to unity, is used to represent the acquired signal during discrimination at the Support Vector Machine stages.</p>
				</sec>
				<sec>
					<st>
						<p>Kinetic Feature Extraction</p>
					</st>
					<p>Extraction of kinetic information was done in two ways (with equivalent feature extractions). The initial method applied begins with identification of the main blockade levels for the various blockade classes (off-line HMM analysis). This information is then used to scan through already labeled (classified) blockade data, with projection of the blockade levels onto the levels previously identified (for that class of molecule). A time-domain FSA performs the above scan, and uses the information obtained to tabulate the lifetimes of the various blockade levels. Once the lifetimes of the various levels are obtained, information about a variety of kinetic properties is accessible. The complication of this "brute force" approach is that the FSA needed to extract kinetic features from the noisy, level-projected, waveform requires careful tuning.</p>
				</sec>
			</sec>
			<sec>
				<st>
					<p>Emission Variance Amplification (EVA) Projection</p>
				</st>
				<p>In the context of an HMM implementation with a stationary set of emission and transition probabilities, emission broadening via amplification of the emission state variances is a filtering heuristic that leads to a level-projection that strongly preserves transition times between major levels. In other words, emission variance amplification (EVA) highly preserves the transition macro-structure between the significant blockade levels. This provides robust kinetic feature extraction with minimal tuning at the FSA kinetic feature extraction stage.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Authors' contributions</p>
			</st>
			<p>The paper was written by SWH and MA. The Cheminformatics software was written by SWH. The idea for the HIV DNA hairpin experiment was from MA. The nanopore experiments were performed by MT, IA, AC, and EM at the labs of MA and SWH. The application of the cheminformatics software was done by ML, JM, and SS, with further refinements for the critical kinetic feature extraction by ML and JM. The Labview/LabWindows setup was done by SS and CB.</p>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>SWH and other New Orleans researchers, ML, MT, IA, EM, CB, and SS, would like to thank MA and Prof. David Deamer at UCSC for strong collaborative support post-Katrina. SWH would like to thank Dr. Wenonah Vercoutere at NASA-Ames for the radiation damaged DNA dataset. Funding was provided by grants from the National Institutes for Health, The National Science Foundation, The Louisiana Board of Regents, and NASA.</p>
			</sec>
		</ack>
		<refgrp>
			<bibl id="B1">
				<title>
					<p>Bacterial transposases and retroviral integrases</p>
				</title>
				<aug>
					<au>
						<snm>Polard</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Chandler</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Mol Microbiol</source>
				<pubdate>1995</pubdate>
				<volume>15</volume>
				<issue>1</issue>
				<fpage>13</fpage>
				<lpage>23</lpage>
				<xrefbib>
					<pubid idtype="pmpid">7752887</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B2">
				<title>
					<p>Mutants and pseudorevertants of Moloney murine leukemia virus with alterations at the integration site</p>
				</title>
				<aug>
					<au>
						<snm>Colicelli</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Goff</snm>
						<fnm>SP</fnm>
					</au>
				</aug>
				<source>Cell</source>
				<pubdate>1985</pubdate>
				<volume>42</volume>
				<issue>2</issue>
				<fpage>573</fpage>
				<lpage>580</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">4028161</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B3">
				<title>
					<p>3'-end processing and kinetics of 5'-end joining during retroviral integration in vivo</p>
				</title>
				<aug>
					<au>
						<snm>Roe</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Chow</snm>
						<fnm>SA</fnm>
					</au>
					<au>
						<snm>Brown</snm>
						<fnm>PO</fnm>
					</au>
				</aug>
				<source>J Virol</source>
				<pubdate>1997</pubdate>
				<volume>71</volume>
				<issue>2</issue>
				<fpage>1334</fpage>
				<lpage>13340</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">191188</pubid>
						<pubid idtype="pmpid" link="fulltext">8995657</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B4">
				<title>
					<p>Activities and substrate specificity of the evolutionarily conserved central domain of retroviral integrase</p>
				</title>
				<aug>
					<au>
						<snm>Kulkosky</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Katz</snm>
						<fnm>RA</fnm>
					</au>
					<au>
						<snm>Merkel</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Skalka</snm>
						<fnm>AM</fnm>
					</au>
				</aug>
				<source>Virology</source>
				<pubdate>1995</pubdate>
				<volume>206</volume>
				<issue>1</issue>
				<fpage>448</fpage>
				<lpage>456</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">7831800</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B5">
				<title>
					<p>The IN protein of Moloney murine leukemia virus processes the viral DNA ends and accomplishes their integration <it>in vitro</it></p>
				</title>
				<aug>
					<au>
						<snm>Craigie</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Fujiwara</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Bushman</snm>
						<fnm>F</fnm>
					</au>
				</aug>
				<source>Cell</source>
				<pubdate>1990</pubdate>
				<volume>62</volume>
				<issue>4</issue>
				<fpage>829</fpage>
				<lpage>837</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">2167180</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B6">
				<title>
					<p>Disruption of the terminal base pairs of retroviral DNA during integration</p>
				</title>
				<aug>
					<au>
						<snm>Scottoline</snm>
						<fnm>BP</fnm>
					</au>
					<au>
						<snm>Chow</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Ellison</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Brown</snm>
						<fnm>PO</fnm>
					</au>
				</aug>
				<source>Genes Dev</source>
				<pubdate>1997</pubdate>
				<volume>11</volume>
				<issue>3</issue>
				<fpage>371</fpage>
				<lpage>382</lpage>
				<xrefbib>
					<pubid idtype="pmpid">9030689</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B7">
				<title>
					<p>Sequence and spacing requirements of a retrovirus integration site</p>
				</title>
				<aug>
					<au>
						<snm>Colicelli</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Goff</snm>
						<fnm>SP</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>1988</pubdate>
				<volume>199</volume>
				<issue>1</issue>
				<fpage>47</fpage>
				<lpage>59</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">3351923</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B8">
				<title>
					<p>Overall structure and sugar dynamics of a DNA dodecamer from homo and heteronuclear dipolar couplings and 31P chemical shift anisotropy</p>
				</title>
				<aug>
					<au>
						<snm>Wu</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Delaglioa</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Tjandrab</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Zhurkinc</snm>
						<fnm>VB</fnm>
					</au>
					<au>
						<snm>Bax</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Journal of Biomolecular NMR</source>
				<pubdate>2003</pubdate>
				<volume>26</volume>
				<fpage>297</fpage>
				<lpage>315</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">12815257</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B9">
				<title>
					<p>Structure of a B-DNA Dodecamer. II. Influence of Base Sequence on Helix Structure</p>
				</title>
				<aug>
					<au>
						<snm>Dickerson</snm>
						<fnm>RE</fnm>
					</au>
					<au>
						<snm>Drew</snm>
						<fnm>HR</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>1981</pubdate>
				<volume>149</volume>
				<fpage>761</fpage>
				<lpage>786</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">6273591</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B10">
				<title>
					<p>X-ray and solution studies of DNA oligomers and implications for the structural basis of A-tract-dependent curvature</p>
				</title>
				<aug>
					<au>
						<snm>Shatzky-Schwartz</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Arbuckle</snm>
						<fnm>ND</fnm>
					</au>
					<au>
						<snm>Eisenstein</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Rabinovich</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Bareket-Samish</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Haran</snm>
						<fnm>TE</fnm>
					</au>
					<au>
						<snm>Luisi</snm>
						<fnm>BF</fnm>
					</au>
					<au>
						<snm>Shakked</snm>
						<fnm>Z</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>1997</pubdate>
				<volume>267</volume>
				<fpage>595</fpage>
				<lpage>623</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">9126841</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B11">
				<title>
					<p>CC/GG-contacts facilitate the B to A transition in solution</p>
				</title>
				<aug>
					<au>
						<snm>Minchenkova</snm>
						<fnm>LE</fnm>
					</au>
					<au>
						<snm>Schyokina</snm>
						<fnm>AK</fnm>
					</au>
					<au>
						<snm>Chernov</snm>
						<fnm>BK</fnm>
					</au>
					<au>
						<snm>Ivanov</snm>
						<fnm>VI</fnm>
					</au>
				</aug>
				<source>J Biomol Struct Dyn</source>
				<pubdate>1986</pubdate>
				<volume>4</volume>
				<fpage>463</fpage>
				<lpage>476</lpage>
				<xrefbib>
					<pubid idtype="pmpid">2908426</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B12">
				<title>
					<p>Some rules for predicting the base-sequence dependence of DNA conformation</p>
				</title>
				<aug>
					<au>
						<snm>Peticolas</snm>
						<fnm>WL</fnm>
					</au>
					<au>
						<snm>Wang</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Thomas</snm>
						<fnm>GA</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>1988</pubdate>
				<volume>85</volume>
				<issue>8</issue>
				<fpage>2579</fpage>
				<lpage>2583</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">280041</pubid>
						<pubid idtype="pmpid" link="fulltext">3357884</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B13">
				<title>
					<p>An A-DNA triplet code: Thermodynamic rules for predicting A- and B-DNA</p>
				</title>
				<aug>
					<au>
						<snm>Basham</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Ho</snm>
						<fnm>PS</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>1995</pubdate>
				<volume>92</volume>
				<fpage>6464</fpage>
				<lpage>6468</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">41538</pubid>
						<pubid idtype="pmpid" link="fulltext">7604014</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B14">
				<title>
					<p>The A-form of DNA: in search of biological role (a review)</p>
				</title>
				<aug>
					<au>
						<snm>Ivanov</snm>
						<fnm>VI</fnm>
					</au>
					<au>
						<snm>Minchenkova</snm>
						<fnm>LE</fnm>
					</au>
				</aug>
				<source>Mol Biol</source>
				<pubdate>1995</pubdate>
				<volume>28</volume>
				<fpage>780</fpage>
				<lpage>788</lpage>
			</bibl>
			<bibl id="B15">
				<title>
					<p>Sequence-Selective Metal Ion Binding to DNA Oligonucleotides</p>
				</title>
				<aug>
					<au>
						<snm>Fr&#248;ystein</snm>
						<fnm>NA</fnm>
					</au>
					<au>
						<snm>Davis</snm>
						<fnm>JT</fnm>
					</au>
					<au>
						<snm>Reid</snm>
						<fnm>BR</fnm>
					</au>
					<au>
						<snm>Sletten</snm>
						<fnm>E</fnm>
					</au>
				</aug>
				<source>Acta Chem Scand</source>
				<pubdate>1993</pubdate>
				<volume>47</volume>
				<fpage>649</fpage>
				<lpage>657</lpage>
				<xrefbib>
					<pubid idtype="pmpid">8363924</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B16">
				<title>
					<p>NMR Studies of Oligonucleotide &#8211; Metal Ion Interactions</p>
				</title>
				<aug>
					<au>
						<snm>Sletten</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Fr&#248;ystein</snm>
						<fnm>NA</fnm>
					</au>
				</aug>
				<source>Metal Ions in Biological Systems</source>
				<editor>Sigel H, Sigel A</editor>
				<pubdate>1996</pubdate>
				<volume>32</volume>
				<fpage>397</fpage>
				<lpage>418</lpage>
				<xrefbib>
					<pubid idtype="pmpid">8640526</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B17">
				<title>
					<p>Intrusion of Counterions into the Spine of Hydration in the Minor Groove of B-DNA: Fractional Occupancy of Electronegative Pockets</p>
				</title>
				<aug>
					<au>
						<snm>Young</snm>
						<fnm>MA</fnm>
					</au>
					<au>
						<snm>Jayaram</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Beveridge</snm>
						<fnm>DL</fnm>
					</au>
				</aug>
				<source>J Am Chem Soc</source>
				<pubdate>1997</pubdate>
				<volume>119</volume>
				<fpage>59</fpage>
				<lpage>69</lpage>
			</bibl>
			<bibl id="B18">
				<title>
					<p>Localization of divalent metal ions in the minor groove of DNA A-tracts</p>
				</title>
				<aug>
					<au>
						<snm>Hud</snm>
						<fnm>NV</fnm>
					</au>
					<au>
						<snm>Feigon</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>J Am Chem Soc</source>
				<pubdate>1997</pubdate>
				<volume>119</volume>
				<fpage>5756</fpage>
				<lpage>5757</lpage>
			</bibl>
			<bibl id="B19">
				<title>
					<p>The B-DNA dodecamer at high resolution reveals a spine of water on sodium</p>
				</title>
				<aug>
					<au>
						<snm>Shui</snm>
						<fnm>X</fnm>
					</au>
					<au>
						<snm>McFail-Isom</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Hu</snm>
						<fnm>GG</fnm>
					</au>
					<au>
						<snm>Williams</snm>
						<fnm>LD</fnm>
					</au>
				</aug>
				<source>Biochemistry</source>
				<pubdate>1998</pubdate>
				<volume>37</volume>
				<fpage>8341</fpage>
				<lpage>8355</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">9622486</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B20">
				<title>
					<p>Structure of the Potassium Form of CGCGAATTCGCG: DNA Deformation by Electrostatic Collapse around Inorganic Cations</p>
				</title>
				<aug>
					<au>
						<snm>Shui</snm>
						<fnm>XQ</fnm>
					</au>
					<au>
						<snm>Sines</snm>
						<fnm>CC</fnm>
					</au>
					<au>
						<snm>McFail-Isom</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>VanDerveer</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Williams</snm>
						<fnm>LD</fnm>
					</au>
				</aug>
				<source>Biochemistry</source>
				<pubdate>1998</pubdate>
				<volume>37</volume>
				<fpage>16877</fpage>
				<lpage>16887</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">9836580</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B21">
				<title>
					<p>Localization of ammonium lon in the minor groove of DNA duplexes in solution and the origin of DNA A-tract bending</p>
				</title>
				<aug>
					<au>
						<snm>Hud</snm>
						<fnm>NV</fnm>
					</au>
					<au>
						<snm>Sklenar</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Feigon</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>1999</pubdate>
				<volume>286</volume>
				<fpage>651</fpage>
				<lpage>660</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">10024440</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B22">
				<title>
					<p>A "Hydrat-Ion Spine" in a B-DNA minor groove</p>
				</title>
				<aug>
					<au>
						<snm>Tereshko</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Minasov</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Egli</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>J Am Chem Soc</source>
				<pubdate>1999</pubdate>
				<volume>121</volume>
				<fpage>3590</fpage>
				<lpage>3595</lpage>
			</bibl>
			<bibl id="B23">
				<title>
					<p>Sequence-specific binding of counterions to B-DNA</p>
				</title>
				<aug>
					<au>
						<snm>Denisov</snm>
						<fnm>VP</fnm>
					</au>
					<au>
						<snm>Halle</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2000</pubdate>
				<volume>97</volume>
				<fpage>629</fpage>
				<lpage>633</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">15381</pubid>
						<pubid idtype="pmpid" link="fulltext">10639130</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B24">
				<title>
					<p>Locating monovalent cations in the grooves of B-DNA</p>
				</title>
				<aug>
					<au>
						<snm>Howerton</snm>
						<fnm>SB</fnm>
					</au>
					<au>
						<snm>Sines</snm>
						<fnm>CC</fnm>
					</au>
					<au>
						<snm>VanDerveer</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Williams</snm>
						<fnm>LD</fnm>
					</au>
				</aug>
				<source>Biochemistry</source>
				<pubdate>2001</pubdate>
				<volume>40</volume>
				<fpage>10023</fpage>
				<lpage>10031</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">11513580</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B25">
				<aug>
					<au>
						<snm>MacPherson</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Introduction to Macromolecular Crystallography</source>
				<publisher>Wiley-Liss</publisher>
				<pubdate>2002</pubdate>
			</bibl>
			<bibl id="B26">
				<title>
					<p>Characterization of Divalent Cation Localization in the Minor Groove of the A<sub>n </sub>T<sub>n </sub>and T<sub>n </sub>A<sub>n </sub>DNA Sequence Elements by <sup>1</sup>H NMR Spectroscopy and Manganese(II)</p>
				</title>
				<aug>
					<au>
						<snm>Hud</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Feigon</snm>
						<fnm>F</fnm>
					</au>
				</aug>
				<source>Biochemistry</source>
				<pubdate>2002</pubdate>
				<volume>41</volume>
				<fpage>9900</fpage>
				<lpage>9910</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">12146955</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B27">
				<title>
					<p>Integration host factor: putting a twist on protein-DNA recognition</p>
				</title>
				<aug>
					<au>
						<snm>Lynch</snm>
						<fnm>TW</fnm>
					</au>
					<au>
						<snm>Read</snm>
						<fnm>EK</fnm>
					</au>
					<au>
						<snm>Mattis</snm>
						<fnm>AN</fnm>
					</au>
					<au>
						<snm>Gardner</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Rice</snm>
						<fnm>PA</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>2003</pubdate>
				<volume>330</volume>
				<issue>3</issue>
				<fpage>493</fpage>
				<lpage>502</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">12842466</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B28">
				<title>
					<p>Structure of a B-DNA Dodecamer. III. Geometry of Hydration</p>
				</title>
				<aug>
					<au>
						<snm>Drew</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Dickerson</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>1981</pubdate>
				<volume>151</volume>
				<fpage>535</fpage>
				<lpage>556</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">7338904</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B29">
				<title>
					<p>Mechanics of sequence-dependent stacking of bases in B-DNA</p>
				</title>
				<aug>
					<au>
						<snm>Calladine</snm>
						<fnm>CR</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>1982</pubdate>
				<volume>161</volume>
				<fpage>343</fpage>
				<lpage>352</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">7154084</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B30">
				<title>
					<p>The structure of an oligo(dA). Oligo(dT) Tract and its biological implications</p>
				</title>
				<aug>
					<au>
						<snm>Nelson</snm>
						<fnm>HCM</fnm>
					</au>
					<au>
						<snm>Finch</snm>
						<fnm>JT</fnm>
					</au>
					<au>
						<snm>Luisi</snm>
						<fnm>BF</fnm>
					</au>
					<au>
						<snm>Klug</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>1987</pubdate>
				<volume>330</volume>
				<fpage>221</fpage>
				<lpage>226</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">3670410</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B31">
				<title>
					<p>DNA structure from A to Z</p>
				</title>
				<aug>
					<au>
						<snm>Dickerson</snm>
						<fnm>RE</fnm>
					</au>
				</aug>
				<source>Methods Enzymol</source>
				<pubdate>1992</pubdate>
				<volume>211</volume>
				<fpage>67</fpage>
				<lpage>111</lpage>
				<xrefbib>
					<pubid idtype="pmpid">1406328</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B32">
				<title>
					<p>Propeller-twisting of base-pairs and the flexibility of dinucleotide steps</p>
				</title>
				<aug>
					<au>
						<snm>El Hassan</snm>
						<fnm>MA</fnm>
					</au>
					<au>
						<snm>Calladine</snm>
						<fnm>CR</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>1996</pubdate>
				<volume>259</volume>
				<fpage>95</fpage>
				<lpage>103</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">8648652</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B33">
				<title>
					<p>Use of 3D structure data for understanding sequence-dependent conformational aspects of DNA</p>
				</title>
				<aug>
					<au>
						<snm>Suzuki</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Amano</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Kakinuma</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Tateno</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>1997</pubdate>
				<volume>274</volume>
				<fpage>421</fpage>
				<lpage>435</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">9405150</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B34">
				<title>
					<p>DNA-cation interactions: the major and minor grooves are flexible ionophores</p>
				</title>
				<aug>
					<au>
						<snm>Hud</snm>
						<fnm>NV</fnm>
					</au>
					<au>
						<snm>Polak</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Curr Opin Struct Biol</source>
				<pubdate>2001</pubdate>
				<volume>11</volume>
				<fpage>293</fpage>
				<lpage>301</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">11406377</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B35">
				<title>
					<p>A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics</p>
				</title>
				<aug>
					<au>
						<snm>SantaLucia</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>1998</pubdate>
				<volume>95</volume>
				<issue>4</issue>
				<fpage>1460</fpage>
				<lpage>1465</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">19045</pubid>
						<pubid idtype="pmpid" link="fulltext">9465037</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B36">
				<title>
					<p>Sequence-dependent DNA structure: tetranucleotide conformational maps</p>
				</title>
				<aug>
					<au>
						<snm>Packer</snm>
						<fnm>MJ</fnm>
					</au>
					<au>
						<snm>Dauncey</snm>
						<fnm>MP</fnm>
					</au>
					<au>
						<snm>Hunter</snm>
						<fnm>CA</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>2000</pubdate>
				<volume>295</volume>
				<fpage>85</fpage>
				<lpage>103</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">10623510</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B37">
				<title>
					<p>Sequence-dependent DNA structure: dinucleotide conformational maps</p>
				</title>
				<aug>
					<au>
						<snm>Packer</snm>
						<fnm>MJ</fnm>
					</au>
					<au>
						<snm>Dauncey</snm>
						<fnm>MP</fnm>
					</au>
					<au>
						<snm>Hunter</snm>
						<fnm>CA</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>2000</pubdate>
				<volume>295</volume>
				<fpage>71</fpage>
				<lpage>83</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">10623509</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B38">
				<title>
					<p>A Unified Model for the Origin of Sequence-Directed Curvature</p>
				</title>
				<aug>
					<au>
						<snm>Hud</snm>
						<fnm>NV</fnm>
					</au>
					<au>
						<snm>Plavec</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Biopolymers</source>
				<pubdate>2003</pubdate>
				<volume>69</volume>
				<fpage>144</fpage>
				<lpage>158</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">12717729</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B39">
				<title>
					<p>Structure of Staphylococcal Alpha-Hemolysin, a Heptameric Transmembrane Pore</p>
				</title>
				<aug>
					<au>
						<snm>Song</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Hobaugh</snm>
						<fnm>MR</fnm>
					</au>
					<au>
						<snm>Shustak</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Cheley</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Bayley</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Gouaux</snm>
						<fnm>JE</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>1996</pubdate>
				<volume>274</volume>
				<issue>5294</issue>
				<fpage>1859</fpage>
				<lpage>1866</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">8943190</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B40">
				<title>
					<p>Nanopore detection using channel current cheminformatics</p>
				</title>
				<aug>
					<au>
						<snm>Winters-Hilt</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>SPIE Second International Symposium on Fluctuations and Noise, 25&#8211;28 May, 2004</source>
			</bibl>
			<bibl id="B41">
				<title>
					<p>Nanopore cheminformatics</p>
				</title>
				<aug>
					<au>
						<snm>Winters-Hilt</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Akeson</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>DNA and Cell Biology</source>
				<pubdate>2004</pubdate>
				<volume>23</volume>
				<issue>10</issue>
				<fpage>675</fpage>
				<lpage>83</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">15585125</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B42">
				<title>
					<p>Highly Accurate Classification of Watson-Crick Basepairs on Termini of Single DNA Molecules</p>
				</title>
				<aug>
					<au>
						<snm>Winters-Hilt</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Vercoutere</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>DeGuzman</snm>
						<fnm>VS</fnm>
					</au>
					<au>
						<snm>Deamer</snm>
						<fnm>DW</fnm>
					</au>
					<au>
						<snm>Akeson</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Haussler</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>Biophys J</source>
				<pubdate>2003</pubdate>
				<volume>84</volume>
				<fpage>967</fpage>
				<lpage>976</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1302674</pubid>
						<pubid idtype="pmpid" link="fulltext">12547778</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B43">
				<title>
					<p>Highly Accurate Real-Time Classification of Channel-Captured DNA Termini</p>
				</title>
				<aug>
					<au>
						<snm>Winters-Hilt</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Third International Conference on Unsolved Problems of Noise and Fluctuations in Physics, Biology, and High Technology</source>
				<pubdate>2003</pubdate>
				<fpage>355</fpage>
				<lpage>368</lpage>
			</bibl>
			<bibl id="B44">
				<title>
					<p>Discrimination Among Individual Watson-Crick Base-Pairs at the Termini of Single DNA Hairpin Molecules</p>
				</title>
				<aug>
					<au>
						<snm>Vercoutere</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Winters-Hilt</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>DeGuzman</snm>
						<fnm>VS</fnm>
					</au>
					<au>
						<snm>Deamer</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Ridino</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Rogers</snm>
						<fnm>JT</fnm>
					</au>
					<au>
						<snm>Olsen</snm>
						<fnm>HE</fnm>
					</au>
					<au>
						<snm>Marziali</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Akeson</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Nucl Acids Res</source>
				<pubdate>2003</pubdate>
				<volume>31</volume>
				<fpage>1311</fpage>
				<lpage>1318</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">150236</pubid>
						<pubid idtype="pmpid" link="fulltext">12582251</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B45">
				<title>
					<p>Rapid discrimination among individual DNA hairpin molecules at single-nucleotide resolution using an ion channel</p>
				</title>
				<aug>
					<au>
						<snm>Vercoutere</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Winters-Hilt</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Olsen</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Deamer</snm>
						<fnm>DW</fnm>
					</au>
					<au>
						<snm>Haussler</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Akeson</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Nat Biotechnol</source>
				<pubdate>2001</pubdate>
				<volume>19</volume>
				<issue>3</issue>
				<fpage>248</fpage>
				<lpage>252</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">11231558</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B46">
				<title>
					<p>Influence of loop residues on the relative stabilities of DNA hairpin structures</p>
				</title>
				<aug>
					<au>
						<snm>Senior</snm>
						<fnm>MM</fnm>
					</au>
					<au>
						<snm>Jones</snm>
						<fnm>RA</fnm>
					</au>
					<au>
						<snm>Breslauer</snm>
						<fnm>KJ</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>1988</pubdate>
				<volume>85</volume>
				<fpage>6242</fpage>
				<lpage>6246</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">281945</pubid>
						<pubid idtype="pmpid" link="fulltext">3413094</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B47">
				<aug>
					<au>
						<snm>Michael</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>Chem-Site 3.01</source>
				<publisher>Pyramid Learning LLC, Hudson, OH</publisher>
				<pubdate>1999</pubdate>
			</bibl>
			<bibl id="B48">
				<aug>
					<au>
						<snm>Cormen</snm>
						<fnm>TH</fnm>
					</au>
					<au>
						<snm>Leiserson</snm>
						<fnm>CE</fnm>
					</au>
					<au>
						<snm>Rivest</snm>
						<fnm>RL</fnm>
					</au>
				</aug>
				<source>Introduction to Algorithms</source>
				<publisher>MIT-Press, Cambridge, USA</publisher>
				<pubdate>1989</pubdate>
			</bibl>
			<bibl id="B49">
				<title>
					<p>Characterization of single channel currents using digital signal processing techniques based on Hidden Markov models</p>
				</title>
				<aug>
					<au>
						<snm>Chung</snm>
						<fnm>S-H</fnm>
					</au>
					<au>
						<snm>Moore</snm>
						<fnm>JB</fnm>
					</au>
					<au>
						<snm>Xia</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Premkumar</snm>
						<fnm>LS</fnm>
					</au>
					<au>
						<snm>Gage</snm>
						<fnm>PW</fnm>
					</au>
				</aug>
				<source>Philos Trans R Soc Lond B Biol Sci</source>
				<pubdate>1990</pubdate>
				<volume>329</volume>
				<fpage>265</fpage>
				<lpage>285</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">1702543</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B50">
				<title>
					<p>Signal processing techniques for channel current analysis based on hidden Markov models</p>
				</title>
				<aug>
					<au>
						<snm>Chung</snm>
						<fnm>S-H</fnm>
					</au>
					<au>
						<snm>Gage</snm>
						<fnm>PW</fnm>
					</au>
				</aug>
				<source>Methods in Enzymology; Ion channels, Part B</source>
				<publisher>Academic Press, Inc., San Diego</publisher>
				<editor>Conn PM</editor>
				<pubdate>1998</pubdate>
				<fpage>420</fpage>
				<lpage>437</lpage>
			</bibl>
			<bibl id="B51">
				<aug>
					<au>
						<snm>Colquhoun</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Sigworth</snm>
						<fnm>FJ</fnm>
					</au>
				</aug>
				<source>Fitting and statistical analysis of single-channel products. Single-channel recording</source>
				<publisher>Plenum Publishing Corp, New York</publisher>
				<editor>Sakmann B, Neher E</editor>
				<edition>Second</edition>
				<pubdate>1995</pubdate>
				<fpage>483</fpage>
				<lpage>587</lpage>
			</bibl>
			<bibl id="B52">
				<aug>
					<au>
						<snm>Durbin</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>Biological sequence analysis : probabilistic models of proteins and nucleic acids</source>
				<publisher>Cambridge, UK &amp; New York: Cambridge University Press</publisher>
				<pubdate>1998</pubdate>
			</bibl>
			<bibl id="B53">
				<aug>
					<au>
						<snm>Vapnik</snm>
						<fnm>VN</fnm>
					</au>
				</aug>
				<source>The Nature of Statistical Learning Theory</source>
				<publisher>Springer-Verlag, New York</publisher>
				<edition>2</edition>
				<pubdate>1998</pubdate>
			</bibl>
			<bibl id="B54">
				<title>
					<p>A tutorial on support vector machines for pattern recognition</p>
				</title>
				<aug>
					<au>
						<snm>Burges</snm>
						<fnm>CJC</fnm>
					</au>
				</aug>
				<source>Data Min Knowl Discov</source>
				<pubdate>1998</pubdate>
				<volume>2</volume>
				<fpage>121</fpage>
				<lpage>67</lpage>
			</bibl>
			<bibl id="B55">
				<title>
					<p>Support Vector Machine Implementations for Classification &amp; Clustering</p>
				</title>
				<aug>
					<au>
						<snm>Winters-Hilt</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Yelundur</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>McChesney</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Landry</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>BMC Bioinformatics</source>
				<pubdate>2006</pubdate>
				<volume>7</volume>
				<issue>Suppl 2</issue>
				<fpage>S4</fpage>
				<note/>
			</bibl>
			<bibl id="B56">
				<title>
					<p>Fast Training of Support Vector Machines using Sequential Minimal Optimization</p>
				</title>
				<aug>
					<au>
						<snm>Platt</snm>
						<fnm>JC</fnm>
					</au>
				</aug>
				<source>Advances in Kernel Methods &#8211; Support Vector Learning</source>
				<publisher>MIT Press, Cambridge, USA</publisher>
				<editor>Scholkopf B, Burges CJC, Smola AJ</editor>
				<pubdate>1998</pubdate>
				<volume>12</volume>
			</bibl>
			<bibl id="B57">
				<title>
					<p>An improved training algorithm for support vector machines</p>
				</title>
				<aug>
					<au>
						<snm>Osuna</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Freund</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Girosi</snm>
						<fnm>F</fnm>
					</au>
				</aug>
				<source>Neural Networks for Signal Processing VII</source>
				<publisher>IEEE, New York</publisher>
				<editor>Principe J, Gile L, Morgan N, Wilson E</editor>
				<pubdate>1997</pubdate>
				<fpage>276</fpage>
				<lpage>85</lpage>
			</bibl>
			<bibl id="B58">
				<title>
					<p>Making large-scale SVM learning practical</p>
				</title>
				<aug>
					<au>
						<snm>Joachims</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<source>Advances in Kernel Methods &#8211; Support Vector Learning</source>
				<publisher>MIT Press, Cambridge, USA</publisher>
				<editor>Scholkopf B, Burges CJC, Smola AJ</editor>
				<pubdate>1998</pubdate>
				<volume>11</volume>
			</bibl>
			<bibl id="B59">
				<title>
					<p>How Does Sequence Define Structure? a Crystallographic map of DNA structure and conformation</p>
				</title>
				<aug>
					<au>
						<snm>Hays</snm>
						<fnm>FA</fnm>
					</au>
					<au>
						<snm>Teegarden</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Jones</snm>
						<fnm>ZJR</fnm>
					</au>
					<au>
						<snm>Harms</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Raup</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Watson</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Cavaliere</snm>
						<fnm>E</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci</source>
				<pubdate>2005</pubdate>
				<volume>102</volume>
				<fpage>7157</fpage>
				<lpage>7162</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1129101</pubid>
						<pubid idtype="pmpid" link="fulltext">15870206</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B60">
				<title>
					<p>Hidden Markov Model Variants and their Application</p>
				</title>
				<aug>
					<au>
						<snm>Winters-Hilt</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>BMC Bioinformatics</source>
				<pubdate>2006</pubdate>
				<volume>7</volume>
				<issue>suppl 2</issue>
				<fpage>S14</fpage>
			</bibl>
			<bibl id="B61">
				<title>
					<p>Microsecond Time-Scale Discrimination Among Polycytidylic Acid, Polyadenylic Acid, and Polyuridylic Acid as Homopolymers or as Segments Within Single RNA Molecules</p>
				</title>
				<aug>
					<au>
						<snm>Akeson</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Branton</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Kasianowicz</snm>
						<fnm>JJ</fnm>
					</au>
					<au>
						<snm>Brandin</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Deamer</snm>
						<fnm>DW</fnm>
					</au>
				</aug>
				<source>Biophys J</source>
				<pubdate>1999</pubdate>
				<volume>77</volume>
				<issue>6</issue>
				<fpage>3227</fpage>
				<lpage>3233</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1300593</pubid>
						<pubid idtype="pmpid" link="fulltext">10585944</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B62">
				<title>
					<p>Utility of the wavelet transform to analyze the stationarity of single ionic channel recordings</p>
				</title>
				<aug>
					<au>
						<snm>Diserbo</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Masson</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Gourmelon</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Caterini</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>J Neurosci Methods</source>
				<pubdate>2000</pubdate>
				<volume>99</volume>
				<issue>1&#8211;2</issue>
				<fpage>137</fpage>
				<lpage>141</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">10936653</pubid>
				</xrefbib>
			</bibl>
		</refgrp>
	</bm>
</art>
