<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>1471-2164-9-S1-S7</ui>
	<ji>1471-2164</ji>
	<fm>
		<dochead>Research</dochead>
		<bibl>
			<title>
				<p>Investigation of transmembrane proteins using a computational approach</p>
			</title>
			<aug>
				<au id="A1">
					<snm>Yang</snm>
					<mi>Y</mi>
					<fnm>Jack</fnm>
					<insr iid="I1"/>
					<email>jyang@bwh.harvard.edu</email>
				</au>
				<au id="A2">
					<snm>Yang</snm>
					<mnm>Qu</mnm>
					<fnm>Mary</fnm>
					<insr iid="I2"/>
					<email>yangma@mail.nih.gov</email>
				</au>
				<au id="A3">
					<snm>Dunker</snm>
					<fnm>A Keith</fnm>
					<insr iid="I3"/>
					<email>kedunker@iupui.edu</email>
				</au>
				<au id="A4" ca="yes">
					<snm>Deng</snm>
					<fnm>Youping</fnm>
					<insr iid="I4"/>
					<email>youping.deng@usm.edu</email>
				</au>
				<au id="A5" ca="yes">
					<snm>Huang</snm>
					<fnm>Xudong</fnm>
					<insr iid="I1"/>
					<email>xhuang3@partners.org</email>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>Department of Radiology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA</p>
				</ins>
				<ins id="I2">
					<p>National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA</p>
				</ins>
				<ins id="I3">
					<p>Center for Computational Biology and Bioinformatics, Indiana University Schools of Medicine and Informatics, 410 W. 10th Street, Indianapolis, IN 46202, USA</p>
				</ins>
				<ins id="I4">
					<p>Department of Biological Sciences, University of Southern Mississippi, Hattiesburg, 39406, USA</p>
				</ins>
			</insg>
			<source>BMC Genomics</source>
			<supplement>
				<title>
					<p>The 2007 International Conference on Bioinformatics &amp; Computational Biology (BIOCOMP'07)</p>
				</title>
				<editor>Jack Y Jang, Mary Qu Yang, Mengxia (Michelle) Zhu, Youping Deng and Hamid R Arabnia</editor>
				<note>Research</note>
			</supplement>
			<conference>
				<title>
					<p>The 2007 International Conference on Bioinformatics &amp; Computational Biology (BIOCOMP'07)</p>
				</title>
				<location>Las Vegas, NV, USA</location>
				<date-range>25-28 June 2007</date-range>
				<url>http://www.world-academy-of-science.org</url>
			</conference>
			<issn>1471-2164</issn>
			<pubdate>2008</pubdate>
			<volume>9</volume>
			<issue>Suppl 1</issue>
			<fpage>S7</fpage>
			<url>http://www.biomedcentral.com/1471-2164/9/S1/S7</url>
			<xrefbib>
				<pubidlist><pubid idtype="pmpid">18366620</pubid><pubid idtype="doi">10.1186/1471-2164-9-S1-S7</pubid>
				</pubidlist></xrefbib>
		</bibl>
		<history>
			<pub>
				<date>
					<day>20</day>
					<month>03</month>
					<year>2008</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2008</year>
			<collab>Yang et al.; licensee BioMed Central Ltd.</collab>
			<note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
		</cpyrt>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<sec>
					<st>
						<p>Background</p>
					</st>
					<p>An important subfamily of membrane proteins are the transmembrane &#945;-helical proteins, in which the membrane-spanning regions are made up of &#945;-helices. Given the obvious biological and medical significance of these proteins, it is of tremendous practical importance to identify the location of transmembrane segments. The difficulty of inferring the secondary or tertiary structure of transmembrane proteins using experimental techniques has led to a surge of interest in applying techniques from machine learning and bioinformatics to infer secondary structure from primary structure in these proteins. We are therefore interested in determining which physicochemical properties are most useful for discriminating transmembrane segments from non-transmembrane segments in transmembrane proteins, and for discriminating intrinsically unstructured segments from intrinsically structured segments in transmembrane proteins, and in using the results of these investigations to develop classifiers to identify transmembrane segments in transmembrane proteins.</p>
				</sec>
				<sec>
					<st>
						<p>Results</p>
					</st>
					<p>We determined that the most useful properties for discriminating transmembrane segments from non-transmembrane segments and for discriminating intrinsically unstructured segments from intrinsically structured segments in transmembrane proteins were hydropathy, polarity, and flexibility, and used the results of this analysis to construct classifiers to discriminate transmembrane segments from non-transmembrane segments using four classification techniques: two variants of the Self-Organizing Global Ranking algorithm, a decision tree algorithm, and a support vector machine algorithm. All four techniques exhibited good performance, with out-of-sample accuracies of approximately 75%.</p>
				</sec>
				<sec>
					<st>
						<p>Conclusions</p>
					</st>
					<p>Several interesting observations emerged from our study: intrinsically unstructured segments and transmembrane segments tend to have opposite properties; transmembrane proteins appear to be much richer in intrinsically unstructured segments than other proteins; and, in approximately 70% of transmembrane proteins that contain intrinsically unstructured segments, the intrinsically unstructured segments are close to transmembrane segments.</p>
				</sec>
			</sec>
		</abs>
	</fm>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st>
			<p>Membrane proteins account for roughly one third of all proteins and play a crucial role in processes such as cell-to-cell signaling, transport of ions across membranes, and energy metabolism <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>, and are a prime target for therapeutic drugs <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>. One important subfamily of membrane proteins are the transmembrane proteins, of which there are two main types:</p>
			<p>&#8226; &#945;-helical proteins, in which the membrane-spanning regions are made up of &#945;-helices, and</p>
			<p>&#8226; &#946;-barrel proteins, in which the membrane-spanning regions are made up of &#946;-strands.</p>
			<p>&#946;-barrel proteins are found mainly in the outer membrane of gram-negative bacteria, and possibly in eukaryotic organelles such as mitochondria, whereas &#945;-helical proteins are found in eukaryotes and the inner membranes of bacteria <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>.</p>
			<p>Given the obvious biological and medical significance of transmembrane proteins, it is of tremendous practical importance to identify the location of transmembrane segments. There are difficulties with obtaining the three dimensional structure of membrane proteins using experimental techniques:</p>
			<p>&#8226; Membrane proteins have both a hydrophilic part and a hydrophobic part, and hence are not entirely soluble in either aqueous or organic solvents; this makes them difficult to crystallize, and hence difficult to analyze using X-ray crystallography, which requires crystallization of the sample.</p>
			<p>&#8226; Membrane proteins tend to denature upon removal from the membrane, making their three-dimensional structure difficult to analyze.</p>
			<p>The difficulty of inferring the secondary or tertiary structure of transmembrane proteins using experimental techniques has led to a surge of interest in applying techniques from machine learning and bioinformatics to infer secondary structure from primary structure in these proteins. These include discriminant analysis <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, decision trees <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, neural networks <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>, support vector machines <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>, and hidden Markov models <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp>.</p>
			<p>Another interesting class of proteins are the intrinsically unstructured proteins, proteins that need not be folded into a particular configuration to carry out their function, existing instead as dynamic ensembles in their native state <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>. Intrinsically unstructured proteins have been associated with a wide range of functions including molecular recognition, molecular assembly/disassembly and protein modification <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B25">25</abbr></abbrgrp>.</p>
			<p>We are interested in investigating the physicochemical properties of various classes of protein segments. In particular, we are interested in determining which properties are useful for discriminating transmembrane segments from non-transmembrane segments in transmembrane proteins, and for discriminating intrinsically unstructured segments from intrinsically structured segments in transmembrane proteins. We are further interested in any similarities or differences in physicochemical properties across these four classes of segments. We will then apply the results of this analysis to construct classifiers to discriminate transmembrane from non-transmembrane segments in transmembrane proteins.</p>
		</sec>
		<sec>
			<st>
				<p>Results and discussion</p>
			</st>
			<sec>
				<st>
					<p>Physicochemical properties</p>
				</st>
				<p>We are interested in determining which physicochemical properties are most useful for discriminating transmembrane segments from non-transmembrane segments in transmembrane proteins, and for discriminating intrinsically unstructured segments from intrinsically structured segments in transmembrane proteins. We are further interested in any similarities or differences in physicochemical properties across these four classes of segments.</p>
				<p>Certain properties, such as hydropathy and polarity, can be measured in different ways; this results in different scales. We are also interested in determining which scales are the most effective in discriminating transmembrane segments from non-transmembrane segments, and in discriminating intrinsically unstructured from intrinsically structured segments in transmembrane proteins.</p>
				<p>Our interest is in properties that can be easily computed given only a sequence of amino acids; we therefore considered properties that depend only on the type of each amino acid in a sequence, including:</p>
				<p>&#8226; Hydropathy, a measure of the relative hydrophobicity of an amino acid. There are four hydropathy scales in common use &#8211; the Kyte-Doolittle <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>, Eisenberg-Schwarz-Komaromy-Wall <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>, Engelman-Steitz-Goldman <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>, and Liu-Deber <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> scales.</p>
				<p>&#8226; Polarity, a measure of how charge is distributed over an amino acid, affects how amino acids interact, and helps to determine protein structure. There are two polarity scales in common use&#8212;the Grantham <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> and the Zimmerman-Eleizer-Simha <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> scales.</p>
				<p>&#8226; Flexibility, a measure of the amount to which an amino acid residue contributes to the flexibility of a protein.</p>
				<p>&#8226; Polarizability, a measure of the extent to which positive and negative charge can be separated in the presence of an applied electric field.</p>
				<p>&#8226; van der Waals volume, a measure of the volume occupied by an amino acid.</p>
				<p>&#8226; Bulkiness, a measure of the volume occupied by an amino acid, is correlated with hydrophobicity <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>.</p>
				<p>&#8226; Electronic effects, a measure that takes into account steric factors, inductive effects, resonance effects, and field effects <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>.</p>
				<p>&#8226; Helicity, the propensity of an amino acid to contribute to the formation of helical structures in proteins <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>.</p>
				<p>Given a sequence of amino acids, the &#8220;pointwise&#8221; property value associated to a particular position in the sequence depends only on which of the 20 amino acids occurs at that position. To increase the robustness of our results, we work with average property values instead of pointwise property values. The average of a given property associated to a particular amino acid <it>A</it> in the sequence is the average of the pointwise property values associated to the amino acids contained in a window of length <it>L</it> centered at <it>A</it>. The effectiveness of each property at discriminating transmembrane from non-transmembrane segments and intrinsically unstructured from intrinsically structured segments was assessed based on two criteria:</p>
				<p>(1) For a given property <it>X</it>, the degree to which the class-conditional distributions for the two classes overlap, that is, the degree to which <it>p<sub>X</sub></it> (<it>x</it>|class 1) and <it>p<sub>X</sub></it> (<it>x</it>|class 2) overlap. The less these two probability distributions overlap, the more easily the two classes can be separated. Knowledge of these probability distributions forms the basis for a Bayesian classifier, which classifies an instance having a value <it>x</it> for property <it>X</it> to &#8220;class 1&#8221; if and only if</p>
				<p>
					<display-formula>
						<m:math name="1471-2164-9-S1-S7-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
							<m:semantics>
								<m:mrow>
									<m:mfrac>
										<m:mrow>
											<m:mi>p</m:mi>
											<m:mi>x</m:mi>
											<m:mo stretchy="false">(</m:mo>
											<m:mi>x</m:mi>
											<m:mo>|</m:mo>
											<m:mi>c</m:mi>
											<m:mi>l</m:mi>
											<m:mi>a</m:mi>
											<m:mi>s</m:mi>
											<m:mi>s</m:mi>
											<m:mi/>
											<m:mn>1</m:mn>
											<m:mo stretchy="false">)</m:mo>
										</m:mrow>
										<m:mrow>
											<m:mi>p</m:mi>
											<m:mi>x</m:mi>
											<m:mo stretchy="false">(</m:mo>
											<m:mi>x</m:mi>
											<m:mo>|</m:mo>
											<m:mi>c</m:mi>
											<m:mi>l</m:mi>
											<m:mi>a</m:mi>
											<m:mi>s</m:mi>
											<m:mi>s</m:mi>
											<m:mi/>
											<m:mn>2</m:mn>
											<m:mo stretchy="false">)</m:mo>
										</m:mrow>
									</m:mfrac>
									<m:mo>&gt;</m:mo>
									<m:mfrac>
										<m:mrow>
											<m:mi>P</m:mi>
											<m:mo>{</m:mo>
											<m:mi>c</m:mi>
											<m:mi>l</m:mi>
											<m:mi>a</m:mi>
											<m:mi>s</m:mi>
											<m:mi>s</m:mi>
											<m:mi/>
											<m:mn>2</m:mn>
											<m:mo>}</m:mo>
										</m:mrow>
										<m:mrow>
											<m:mi>P</m:mi>
											<m:mo>{</m:mo>
											<m:mi>c</m:mi>
											<m:mi>l</m:mi>
											<m:mi>a</m:mi>
											<m:mi>s</m:mi>
											<m:mi>s</m:mi>
											<m:mi/>
											<m:mn>1</m:mn>
											<m:mo>}</m:mo>
										</m:mrow>
									</m:mfrac>
								</m:mrow>
								<m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakeaajuaGdaWcaaGcbaqcLbuacqWGWbaCcqWG4baEcWaGmkikaGIamaiJdIha4jadaYOG8baFryWrL9MCNLwyaGGbaiacmY4FJbGaiWiJ=XgacGaJm+xyaiacmY4FZbGaiWiJ=nhacGaJm+hiaiadmYiIXaqmcWaGmkykaKcakeaajugqbiabdchaWjabdIha4jabcIcaOiabdIha4jabcYha8jaa=ngacaWFSbGaa8xyaiaa=nhacaWFZbGaa8hiaiaa=jdacqGGPaqkaaGaeyOpa4tcfa4aaSaaaOqaaKqzafGamaiJdcfaqjadaYOG7bWEcGaGy+3yaiacaI5FSbGaiaiM=fgacGaGy+3CaiacaI5FZbGaiaiM=bcacGaGm+NmaiadaYOG9bqFaOqaaKqzafGaemiuaaLaei4EaSNaiaiJ=ngacGaGm+hBaiacaY4FHbGaiaiJ=nhacGaGm+3CaiacaY4FGaGaa8xmaiabc2ha9baaaaa@9263@</m:annotation>
							</m:semantics>
						</m:math>
					</display-formula>
				</p>
				<p>where <it>P</it>{class 1} is the probability of observing a class 1 instance and <it>P</it>{class 2} is the probability of observing a class 2 instance. The class-conditional probability distributions for the above properties are plotted in Figures <figr fid="F1">1</figr>,<figr fid="F2">2</figr>,<figr fid="F3">3</figr>.</p>
				<p>(2) The Overlap Ratio, defined in the Methods section, is a numerical measure of the overlap between the conditional probabilities P{class 1|<it>X</it> = <it>x</it>} and P{class 2|<it>X</it> = <it>x</it>}. The smaller the Overlap Ratio, the more easily the two classes can be discriminated.</p>
				<p>The Overlap Ratios for discriminating transmembrane from non-transmembrane segments are shown in Table <tblr tid="T1">1</tblr>, while the Overlap Ratios for discriminating intrinsically unstructured from intrinsically structured segments are shown in Table <tblr tid="T2">2</tblr>. It turns out that the discriminating power of a given property depends on the length <it>L</it> of the window over which property values are averaged; Overlap Ratios are given in Tables <tblr tid="T1">1</tblr> and <tblr tid="T2">2</tblr> for all odd values of the window length <it>L</it> between 9 and 31.</p>
				<fig id="F1">
					<title>
						<p>Figure 1</p>
					</title>
					<caption>
						<p>Conditional probability distributions <it>p</it>(<it>x</it>|TM), <it>p</it>(<it>x</it>|Non-TM) (on the left), and <it>p</it>(<it>x</it>|IU), <it>p</it>(<it>x</it>|Non-IU) (on the right), where <it>x</it> is hydropathy, as determined by the Kyte-Doolittle, Eisenberg-Schwarz- Komaromy-Wall, Engelman-Steitz-Goldman, and Liu-Deber scales. TM = transmembrane, IU = intrinsi-cally unstructured. The plots on the left were reproduced with permission from <abbrgrp><abbr bid="B38">38</abbr></abbrgrp></p>
					</caption>
					<text>
						<p>Conditional probability distributions <it>p</it>(<it>x</it>|TM), <it>p</it>(<it>x</it>|Non-TM) (on the left), and <it>p</it>(<it>x</it>|IU), <it>p</it>(<it>x</it>|Non-IU) (on the right), where <it>x</it> is hydropathy, as determined by the Kyte-Doolittle, Eisenberg-Schwarz- Komaromy-Wall, Engelman-Steitz-Goldman, and Liu-Deber scales. TM = transmembrane, IU = intrinsi-cally unstructured. The plots on the left were reproduced with permission from <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>.</p>
					</text>
					<graphic file="1471-2164-9-S1-S7-1"/>
				</fig>
				<fig id="F2">
					<title>
						<p>Figure 2</p>
					</title>
					<caption>
						<p>Conditional probability distributions <it>p</it>(<it>x</it>|TM), <it>p</it>(<it>x</it>|Non-TM) (on the left), and <it>p</it>(<it>x</it>|IU), <it>p</it>(<it>x</it>|Non-IU) (on the right), where <it>x</it> is, from top to bottom, polarity, as determined by the Grantham and Zimmerman-Eleizer-Simha scales, bulkiness, and flexibility. TM = transmembrane, IU = intrinsically unstructured. The plots on the left were reproduced with permission from <abbrgrp><abbr bid="B38">38</abbr></abbrgrp></p>
					</caption>
					<text>
						<p>Conditional probability distributions <it>p</it>(<it>x</it>|TM), <it>p</it>(<it>x</it>|Non-TM) (on the left), and <it>p</it>(<it>x</it>|IU), <it>p</it>(<it>x</it>|Non-IU) (on the right), where <it>x</it> is, from top to bottom, polarity, as determined by the Grantham and Zimmerman-Eleizer-Simha scales, bulkiness, and flexibility. TM = transmembrane, IU = intrinsically unstructured. The plots on the left were reproduced with permission from <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>.</p>
					</text>
					<graphic file="1471-2164-9-S1-S7-2"/>
				</fig>
				<fig id="F3">
					<title>
						<p>Figure 3</p>
					</title>
					<caption>
						<p>Conditional probability distributions <it>p</it>(<it>x</it>|TM), <it>p</it>(<it>x</it>|Non-TM) (on the left), and <it>p</it>(<it>x</it>|IU), <it>p</it>(<it>x</it>|Non-IU) (on the right), where <it>x</it> is, from top to bottom, van der Waals volume, polarizability, elec-tronic effects, and helicity. TM = transmembrane, IU = intrinsically unstructured. The plots on the left were reproduced with permission from <abbrgrp><abbr bid="B38">38</abbr></abbrgrp></p>
					</caption>
					<text>
						<p>Conditional probability distributions <it>p</it>(<it>x</it>|TM), <it>p</it>(<it>x</it>|Non-TM) (on the left), and <it>p</it>(<it>x</it>|IU), <it>p</it>(<it>x</it>|Non-IU) (on the right), where <it>x</it> is, from top to bottom, van der Waals volume, polarizability, elec-tronic effects, and helicity. TM = transmembrane, IU = intrinsically unstructured. The plots on the left were reproduced with permission from <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>.</p>
					</text>
					<graphic file="1471-2164-9-S1-S7-3"/>
				</fig>
				<tbl id="T1" hint_layout="single">
					<title>
						<p>Table 1</p>
					</title>
					<caption>
						<p>Overlap Ratios for discriminating transmembrane segments from non-transmembrane segments in membrane proteins as a function of window length (W.L.).</p>
					</caption>
					<tblbdy cols="10">
						<r>
							<c>
								<p>W.L.</p>
							</c>
							<c>
								<p>
									<it>H<sub>KD</sub></it>
								</p>
							</c>
							<c>
								<p>
									<it>H<sub>Ei</sub></it>
								</p>
							</c>
							<c>
								<p>
									<it>H<sub>En</sub></it>
								</p>
							</c>
							<c>
								<p>
									<it>H<sub>LD</sub></it>
								</p>
							</c>
							<c>
								<p>
									<b>
										<it>P<sub>G</sub></it>
									</b>
								</p>
							</c>
							<c>
								<p>
									<b>
										<it>P<sub>Z</sub></it>
									</b>
								</p>
							</c>
							<c>
								<p>Bulk.</p>
							</c>
							<c>
								<p>Flex.</p>
							</c>
							<c>
								<p>Elec.</p>
							</c>
						</r>
						<r>
							<c cspan="10">
								<hr/>
							</c>
						</r>
						<r>
							<c>
								<p>31</p>
							</c>
							<c>
								<p>0.249</p>
							</c>
							<c>
								<p>0.221</p>
							</c>
							<c>
								<p>0.260</p>
							</c>
							<c>
								<p>0.198</p>
							</c>
							<c>
								<p>0.249</p>
							</c>
							<c>
								<p>0.211</p>
							</c>
							<c>
								<p>0.423</p>
							</c>
							<c>
								<p>0.294</p>
							</c>
							<c>
								<p>0.504</p>
							</c>
						</r>
						<r>
							<c>
								<p>29</p>
							</c>
							<c>
								<p>0.232</p>
							</c>
							<c>
								<p>0.197</p>
							</c>
							<c>
								<p>0.241</p>
							</c>
							<c>
								<p>0.183</p>
							</c>
							<c>
								<p>0.223</p>
							</c>
							<c>
								<p>0.223</p>
							</c>
							<c>
								<p>0.397</p>
							</c>
							<c>
								<p>0.278</p>
							</c>
							<c>
								<p>0.499</p>
							</c>
						</r>
						<r>
							<c>
								<p>27</p>
							</c>
							<c>
								<p>0.231</p>
							</c>
							<c>
								<p>0.203</p>
							</c>
							<c>
								<p>0.213</p>
							</c>
							<c>
								<p>0.194</p>
							</c>
							<c>
								<p>0.232</p>
							</c>
							<c>
								<p>0.232</p>
							</c>
							<c>
								<p>0.412</p>
							</c>
							<c>
								<p>0.266</p>
							</c>
							<c>
								<p>0.462</p>
							</c>
						</r>
						<r>
							<c>
								<p>25</p>
							</c>
							<c>
								<p>0.238</p>
							</c>
							<c>
								<p>0.198</p>
							</c>
							<c>
								<p>0.227</p>
							</c>
							<c>
								<p>0.178</p>
							</c>
							<c>
								<p>0.215</p>
							</c>
							<c>
								<p>0.269</p>
							</c>
							<c>
								<p>0.393</p>
							</c>
							<c>
								<p>0.269</p>
							</c>
							<c>
								<p>0.411</p>
							</c>
						</r>
						<r>
							<c>
								<p>23</p>
							</c>
							<c>
								<p>0.217</p>
							</c>
							<c>
								<p>0.204</p>
							</c>
							<c>
								<p>0.219</p>
							</c>
							<c>
								<p>0.177</p>
							</c>
							<c>
								<p>0.208</p>
							</c>
							<c>
								<p>0.233</p>
							</c>
							<c>
								<p>0.385</p>
							</c>
							<c>
								<p>0.258</p>
							</c>
							<c>
								<p>0.434</p>
							</c>
						</r>
						<r>
							<c>
								<p>21</p>
							</c>
							<c>
								<p>0.209</p>
							</c>
							<c>
								<p>0.204</p>
							</c>
							<c>
								<p>0.215</p>
							</c>
							<c>
								<p>0.166</p>
							</c>
							<c>
								<p>0.216</p>
							</c>
							<c>
								<p>0.197</p>
							</c>
							<c>
								<p>0.370</p>
							</c>
							<c>
								<p>0.252</p>
							</c>
							<c>
								<p>0.379</p>
							</c>
						</r>
						<r>
							<c>
								<p>19</p>
							</c>
							<c>
								<p>0.214</p>
							</c>
							<c>
								<p>0.222</p>
							</c>
							<c>
								<p>0.220</p>
							</c>
							<c>
								<p>0.199</p>
							</c>
							<c>
								<p>0.224</p>
							</c>
							<c>
								<p>0.235</p>
							</c>
							<c>
								<p>0.415</p>
							</c>
							<c>
								<p>0.259</p>
							</c>
							<c>
								<p>0.389</p>
							</c>
						</r>
						<r>
							<c>
								<p>17</p>
							</c>
							<c>
								<p>0.201</p>
							</c>
							<c>
								<p>0.252</p>
							</c>
							<c>
								<p>0.218</p>
							</c>
							<c>
								<p>0.199</p>
							</c>
							<c>
								<p>0.219</p>
							</c>
							<c>
								<p>0.206</p>
							</c>
							<c>
								<p>0.393</p>
							</c>
							<c>
								<p>0.259</p>
							</c>
							<c>
								<p>0.442</p>
							</c>
						</r>
						<r>
							<c>
								<p>15</p>
							</c>
							<c>
								<p>0.191</p>
							</c>
							<c>
								<p>0.195</p>
							</c>
							<c>
								<p>0.201</p>
							</c>
							<c>
								<p>0.214</p>
							</c>
							<c>
								<p>0.224</p>
							</c>
							<c>
								<p>0.193</p>
							</c>
							<c>
								<p>0.356</p>
							</c>
							<c>
								<p>0.283</p>
							</c>
							<c>
								<p>0.456</p>
							</c>
						</r>
						<r>
							<c>
								<p>13</p>
							</c>
							<c>
								<p>0.216</p>
							</c>
							<c>
								<p>0.203</p>
							</c>
							<c>
								<p>0.217</p>
							</c>
							<c>
								<p>0.178</p>
							</c>
							<c>
								<p>0.203</p>
							</c>
							<c>
								<p>0.189</p>
							</c>
							<c>
								<p>0.325</p>
							</c>
							<c>
								<p>0.283</p>
							</c>
							<c>
								<p>0.500</p>
							</c>
						</r>
						<r>
							<c>
								<p>11</p>
							</c>
							<c>
								<p>0.210</p>
							</c>
							<c>
								<p>0.199</p>
							</c>
							<c>
								<p>0.228</p>
							</c>
							<c>
								<p>0.185</p>
							</c>
							<c>
								<p>0.204</p>
							</c>
							<c>
								<p>0.168</p>
							</c>
							<c>
								<p>0.346</p>
							</c>
							<c>
								<p>0.277</p>
							</c>
							<c>
								<p>0.493</p>
							</c>
						</r>
						<r>
							<c>
								<p>9</p>
							</c>
							<c>
								<p>0.231</p>
							</c>
							<c>
								<p>0.205</p>
							</c>
							<c>
								<p>0.222</p>
							</c>
							<c>
								<p>0.200</p>
							</c>
							<c>
								<p>0.232</p>
							</c>
							<c>
								<p>0.280</p>
							</c>
							<c>
								<p>0.396</p>
							</c>
							<c>
								<p>0.299</p>
							</c>
							<c>
								<p>0.562</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>Here <it>H<sub>KD</sub>, H<sub>Ei</sub>, H<sub>En</sub>, H<sub>LD</sub></it> indicate the Kyte-Doolittle, Eisenberg-Schwarz-Komaromy-Wall, Engelman-Steitz-Goldman, and Liu-Deber hydropathy scales, respectively, <it>P<sub>G</sub>, P<sub>z</sub></it> indicate the Grantham and Zimmerman-Eliezer-Simha polarity scales, respectively, Bulk. = bulkiness, Flex. = flexibility, and Elec. = electronic effects.</p>
						<p/>
						<p>Reproduced with permission from <abbrgrp><abbr bid="B38">38</abbr></abbrgrp></p>
					</tblfn>
				</tbl>
				<tbl id="T2" hint_layout="single">
					<title>
						<p>Table 2</p>
					</title>
					<caption>
						<p>Overlap Ratios for discriminating intrinsically unstructured segments from intrinsically structured segments in membrane proteins as a function of window length (W.L.).</p>
					</caption>
					<tblbdy cols="9">
						<r>
							<c>
								<p>W.L.</p>
							</c>
							<c>
								<p>
									<it>H<sub>KD</sub></it>
								</p>
							</c>
							<c>
								<p>
									<it>H<sub>Ei</sub></it>
								</p>
							</c>
							<c>
								<p>
									<it>H<sub>En</sub></it>
								</p>
							</c>
							<c>
								<p>
									<it>H<sub>LD</sub></it>
								</p>
							</c>
							<c>
								<p>
									<it>P<sub>G</sub></it>
								</p>
							</c>
							<c>
								<p>
									<it>P<sub>Z</sub></it>
								</p>
							</c>
							<c>
								<p>Bulk.</p>
							</c>
							<c>
								<p>Flex.</p>
							</c>
						</r>
						<r>
							<c cspan="9">
								<hr/>
							</c>
						</r>
						<r>
							<c>
								<p>31</p>
							</c>
							<c>
								<p>0.318</p>
							</c>
							<c>
								<p>0.163</p>
							</c>
							<c>
								<p>0.170</p>
							</c>
							<c>
								<p>0.243</p>
							</c>
							<c>
								<p>0.220</p>
							</c>
							<c>
								<p>0.134</p>
							</c>
							<c>
								<p>0.349</p>
							</c>
							<c>
								<p>0.227</p>
							</c>
						</r>
						<r>
							<c>
								<p>29</p>
							</c>
							<c>
								<p>0.221</p>
							</c>
							<c>
								<p>0.229</p>
							</c>
							<c>
								<p>0.167</p>
							</c>
							<c>
								<p>0.249</p>
							</c>
							<c>
								<p>0.138</p>
							</c>
							<c>
								<p>0.161</p>
							</c>
							<c>
								<p>0.351</p>
							</c>
							<c>
								<p>0.238</p>
							</c>
						</r>
						<r>
							<c>
								<p>27</p>
							</c>
							<c>
								<p>0.222</p>
							</c>
							<c>
								<p>0.150</p>
							</c>
							<c>
								<p>0.164</p>
							</c>
							<c>
								<p>0.230</p>
							</c>
							<c>
								<p>0.170</p>
							</c>
							<c>
								<p>0.142</p>
							</c>
							<c>
								<p>0.221</p>
							</c>
							<c>
								<p>0.263</p>
							</c>
						</r>
						<r>
							<c>
								<p>25</p>
							</c>
							<c>
								<p>0.216</p>
							</c>
							<c>
								<p>0.234</p>
							</c>
							<c>
								<p>0.162</p>
							</c>
							<c>
								<p>0.241</p>
							</c>
							<c>
								<p>0.175</p>
							</c>
							<c>
								<p>0.142</p>
							</c>
							<c>
								<p>0.364</p>
							</c>
							<c>
								<p>0.272</p>
							</c>
						</r>
						<r>
							<c>
								<p>23</p>
							</c>
							<c>
								<p>0.253</p>
							</c>
							<c>
								<p>0.143</p>
							</c>
							<c>
								<p>0.160</p>
							</c>
							<c>
								<p>0.253</p>
							</c>
							<c>
								<p>0.163</p>
							</c>
							<c>
								<p>0.157</p>
							</c>
							<c>
								<p>0.238</p>
							</c>
							<c>
								<p>0.254</p>
							</c>
						</r>
						<r>
							<c>
								<p>21</p>
							</c>
							<c>
								<p>0.182</p>
							</c>
							<c>
								<p>0.139</p>
							</c>
							<c>
								<p>0.144</p>
							</c>
							<c>
								<p>0.267</p>
							</c>
							<c>
								<p>0.176</p>
							</c>
							<c>
								<p>0.159</p>
							</c>
							<c>
								<p>0.323</p>
							</c>
							<c>
								<p>0.271</p>
							</c>
						</r>
						<r>
							<c>
								<p>19</p>
							</c>
							<c>
								<p>0.285</p>
							</c>
							<c>
								<p>0.142</p>
							</c>
							<c>
								<p>0.149</p>
							</c>
							<c>
								<p>0.257</p>
							</c>
							<c>
								<p>0.172</p>
							</c>
							<c>
								<p>0.251</p>
							</c>
							<c>
								<p>0.337</p>
							</c>
							<c>
								<p>0.291</p>
							</c>
						</r>
						<r>
							<c>
								<p>17</p>
							</c>
							<c>
								<p>0.290</p>
							</c>
							<c>
								<p>0.199</p>
							</c>
							<c>
								<p>0.148</p>
							</c>
							<c>
								<p>0.266</p>
							</c>
							<c>
								<p>0.183</p>
							</c>
							<c>
								<p>0.307</p>
							</c>
							<c>
								<p>0.353</p>
							</c>
							<c>
								<p>0.279</p>
							</c>
						</r>
						<r>
							<c>
								<p>15</p>
							</c>
							<c>
								<p>0.320</p>
							</c>
							<c>
								<p>0.170</p>
							</c>
							<c>
								<p>0.155</p>
							</c>
							<c>
								<p>0.274</p>
							</c>
							<c>
								<p>0.182</p>
							</c>
							<c>
								<p>0.183</p>
							</c>
							<c>
								<p>0.338</p>
							</c>
							<c>
								<p>0.361</p>
							</c>
						</r>
						<r>
							<c>
								<p>13</p>
							</c>
							<c>
								<p>0.264</p>
							</c>
							<c>
								<p>0.180</p>
							</c>
							<c>
								<p>0.165</p>
							</c>
							<c>
								<p>0.284</p>
							</c>
							<c>
								<p>0.194</p>
							</c>
							<c>
								<p>0.254</p>
							</c>
							<c>
								<p>0.358</p>
							</c>
							<c>
								<p>0.340</p>
							</c>
						</r>
						<r>
							<c>
								<p>11</p>
							</c>
							<c>
								<p>0.310</p>
							</c>
							<c>
								<p>0.228</p>
							</c>
							<c>
								<p>0.195</p>
							</c>
							<c>
								<p>0.281</p>
							</c>
							<c>
								<p>0.220</p>
							</c>
							<c>
								<p>0.446</p>
							</c>
							<c>
								<p>0.345</p>
							</c>
							<c>
								<p>0.358</p>
							</c>
						</r>
						<r>
							<c>
								<p>9</p>
							</c>
							<c>
								<p>0.372</p>
							</c>
							<c>
								<p>0.230</p>
							</c>
							<c>
								<p>0.226</p>
							</c>
							<c>
								<p>0.325</p>
							</c>
							<c>
								<p>0.269</p>
							</c>
							<c>
								<p>0.251</p>
							</c>
							<c>
								<p>0.416</p>
							</c>
							<c>
								<p>0.401</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>Here <it>H<sub>KD</sub>, H<sub>Ei</sub>, H<sub>En</sub>, H<sub>LD</sub></it> indicate the Kyte-Doolittle, Eisenberg-Schwarz-Komaromy-Wall, Engelman-Steitz-Goldman, and Liu-Deber hydropathy scales, respectively, <it>P<sub>G</sub>, P<sub>Z</sub></it> indicate the Grantham and Zimmerman-Eliezer-Simha polarity scales, respectively, Bulk. = bulkiness, and Flex. = flexibility.</p>
						<p/>
						<p>Reproduced with permission from <abbrgrp><abbr bid="B38">38</abbr></abbrgrp></p>
					</tblfn>
				</tbl>
				<p>Our conclusions were as follows:</p>
				<p>&#8226; Whereas all four hydropathy scales can be used for discriminating transmembrane segments for non-transmembrane segments in transmembrane proteins, the Liu-Deber scale is the best scale for this task.</p>
				<p>&#8226; Whereas all four hydropathy scales can be used for discriminating intrinsically unstructured segments from intrinsically structured segments in transmembrane proteins, the Eisenberg-Schwarz-Komaromy-Wall scale is the best scale for this task.</p>
				<p>&#8226; Whereas both polarity scales can be used for discriminating transmembrane from non-transmembrane segments and for discriminating intrinsically unstructured from intrinsically structured segments in transmembrane proteins, the Grantham scale is slightly better for these tasks.</p>
				<p>&#8226; For both classification problems (discriminating transmembrane from non-transmembrane segments and discriminating intrinsically unstructured from intrinsically structured segments), flexibility provided some degree of discriminating power, and bulkiness provided still less; neither property was as effective as hydropathy or polarity at discriminating between the two classes.</p>
				<p>&#8226; For both classification problems, polarizability, van der Waals volume, electronic effects, and helicity did not discriminate well between the two classes.</p>
			</sec>
			<sec>
				<st>
					<p>Transmembrane segment classifiers</p>
				</st>
				<p>We tested four classification techniques on the problem of discriminating transmembrane segments from non-transmembrane segments in transmembrane proteins:</p>
				<p>&#8226; C4.5 <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>, a decision tree algorithm.</p>
				<p>&#8226; SVM<sup>light</sup> version 6.01 (linear kernel function) <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>, a support vector machine algorithm.</p>
				<p>&#8226; Two variants of the Self-Organizing Global Ranking (SOGR) algorithm <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>, SOGR-I <abbrgrp><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr></abbrgrp> and SOGR-IB <abbrgrp><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr></abbrgrp>, which are described in detail in the Methods section. These algorithms depend on a number of parameters: the length <it>L</it> of the window used to extract features, the number of neurons <it>m</it>, the learning rate &#951;<sub><it>t</it></sub>, and the neighborhood size <it>R</it>. The performance of these algorithms depends on the choice of these parameters: For example, the performance of the SOGR-I algorithm as a function of the length of the window used to extract features is shown in Figure <figr fid="F4">4</figr>. Based on a series of experiments, we settled on feature window length <it>L</it> of 10, a network size <it>m</it> of 16 neurons, a fixed learning rate &#951;<sub><it>t</it></sub> of .05, and a neighborhood size <it>R</it> of 2. Since the length of the window used to extract features was chosen to maximize the performance of the SOGR-I algorithm, the results will be slightly biased in favor of the SOGR-I and SOGR-IB algorithms.</p>
				<fig id="F4">
					<title>
						<p>Figure 4</p>
					</title>
					<caption>
						<p>Performance of the SOGR-I classifier as a function of the length of the window used to extract features, based on threefold cross-validation (fixed learning rate &#951;<sub><it>t</it></sub> = .05, neighborhood size <it>R</it> = 2, number of neurons = 16). Reproduced with permission from <abbrgrp><abbr bid="B38">38</abbr></abbrgrp></p>
					</caption>
					<text>
						<p>Performance of the SOGR-I classifier as a function of the length of the window used to extract features, based on threefold cross-validation (fixed learning rate &#951;<sub><it>t</it></sub> = .05, neighborhood size <it>R</it> = 2, number of neurons = 16). Reproduced with permission from <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>.</p>
					</text>
					<graphic file="1471-2164-9-S1-S7-4"/>
				</fig>
				<p>Designing a classifier also involves selecting the features that are most useful for the problem of interest. Based on our investigations of physicochemical properties, we based the classification on three features:</p>
				<p>&#8226; Hydropathy (Liu-Deber scale)</p>
				<p>&#8226; Polarity (Grantham scale)</p>
				<p>&#8226; Flexibility</p>
				<p>The performance of the above four classification techniques under ten-fold cross-validation when hydropathy (Liu-Deber scale), polarity (Grantham scale), and flexibility are used as features is shown in Table <tblr tid="T3">3</tblr>, while the performance when only polarity (Grantham scale) and flexibility are used as features is shown in Table <tblr tid="T4">4</tblr>. It is interesting that performance drops only slightly when two features are used instead of three. All four classifiers exhibited good performance, with out-of-sample accuracies of approximately 75%. While this may seem low, the substantial overlap of the transmembrane and non-transmembrane classes seen in Figures <figr fid="F1">1</figr>,<figr fid="F2">2</figr>,<figr fid="F3">3</figr> makes this a nontrivial classification problem. Filtering strategies can be used to improve the performance of these classifiers <abbrgrp><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr></abbrgrp>.</p>
				<tbl id="T3" hint_layout="single">
					<title>
						<p>Table 3</p>
					</title>
					<caption>
						<p>Accuracy of discriminating transmembrane segments from non-transmembrane segments in trans-membrane proteins using the SOGR-I and SOGR-IB classifiers, a decision tree classifier (C4.5), and a support vector machine classifier (SVM<sup>light</sup> version 6.01), based on ten-fold cross-validation. Three features were used, namely hydropathy (Liu-Deber scale), polarity (Grantham scale), and flexibility.</p>
					</caption>
					<tblbdy cols="6">
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c cspan="2">
								<p>C4.5</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p>Fold</p>
							</c>
							<c>
								<p>SOGR-I</p>
							</c>
							<c>
								<p>SOGR-IB</p>
							</c>
							<c>
								<p>Before Pruning</p>
							</c>
							<c>
								<p>After Pruning</p>
							</c>
							<c>
								<p>SVM</p>
							</c>
						</r>
						<r>
							<c cspan="6">
								<hr/>
							</c>
						</r>
						<r>
							<c>
								<p>1</p>
							</c>
							<c>
								<p>72.2311</p>
							</c>
							<c>
								<p>72.2311</p>
							</c>
							<c>
								<p>72.4960</p>
							</c>
							<c>
								<p>72.5490</p>
							</c>
							<c>
								<p>72.9730</p>
							</c>
						</r>
						<r>
							<c>
								<p>2</p>
							</c>
							<c>
								<p>69.0476</p>
							</c>
							<c>
								<p>67.1733</p>
							</c>
							<c>
								<p>67.8318</p>
							</c>
							<c>
								<p>67.6798</p>
							</c>
							<c>
								<p>67.3252</p>
							</c>
						</r>
						<r>
							<c>
								<p>3</p>
							</c>
							<c>
								<p>77.1277</p>
							</c>
							<c>
								<p>76.9149</p>
							</c>
							<c>
								<p>77.5532</p>
							</c>
							<c>
								<p>77.6596</p>
							</c>
							<c>
								<p>77.7660</p>
							</c>
						</r>
						<r>
							<c>
								<p>4</p>
							</c>
							<c>
								<p>81.8913</p>
							</c>
							<c>
								<p>84.5875</p>
							</c>
							<c>
								<p>83.7827</p>
							</c>
							<c>
								<p>83.7827</p>
							</c>
							<c>
								<p>83.4608</p>
							</c>
						</r>
						<r>
							<c>
								<p>5</p>
							</c>
							<c>
								<p>79.3146</p>
							</c>
							<c>
								<p>78.4889</p>
							</c>
							<c>
								<p>78.3237</p>
							</c>
							<c>
								<p>78.4476</p>
							</c>
							<c>
								<p>78.1586</p>
							</c>
						</r>
						<r>
							<c>
								<p>6</p>
							</c>
							<c>
								<p>81.4600</p>
							</c>
							<c>
								<p>83.6230</p>
							</c>
							<c>
								<p>82.8119</p>
							</c>
							<c>
								<p>83.1595</p>
							</c>
							<c>
								<p>82.0780</p>
							</c>
						</r>
						<r>
							<c>
								<p>7</p>
							</c>
							<c>
								<p>75.9410</p>
							</c>
							<c>
								<p>76.8266</p>
							</c>
							<c>
								<p>75.6458</p>
							</c>
							<c>
								<p>75.9410</p>
							</c>
							<c>
								<p>76.3100</p>
							</c>
						</r>
						<r>
							<c>
								<p>8</p>
							</c>
							<c>
								<p>78.3488</p>
							</c>
							<c>
								<p>79.2783</p>
							</c>
							<c>
								<p>79.8797</p>
							</c>
							<c>
								<p>79.9891</p>
							</c>
							<c>
								<p>79.2783</p>
							</c>
						</r>
						<r>
							<c>
								<p>9</p>
							</c>
							<c>
								<p>64.1365</p>
							</c>
							<c>
								<p>65.0418</p>
							</c>
							<c>
								<p>64.5543</p>
							</c>
							<c>
								<p>64.5543</p>
							</c>
							<c>
								<p>64.7632</p>
							</c>
						</r>
						<r>
							<c>
								<p>10</p>
							</c>
							<c>
								<p>67.2325</p>
							</c>
							<c>
								<p>65.6089</p>
							</c>
							<c>
								<p>66.8635</p>
							</c>
							<c>
								<p>67.0111</p>
							</c>
							<c>
								<p>67.6753</p>
							</c>
						</r>
						<r>
							<c>
								<p>Mean</p>
							</c>
							<c>
								<p>74.7</p>
							</c>
							<c>
								<p>75.0</p>
							</c>
							<c>
								<p>75.0</p>
							</c>
							<c>
								<p>75.1</p>
							</c>
							<c>
								<p>75.0</p>
							</c>
						</r>
						<r>
							<c>
								<p>Std. dev.</p>
							</c>
							<c>
								<p>6.2</p>
							</c>
							<c>
								<p>7.2</p>
							</c>
							<c>
								<p>6.8</p>
							</c>
							<c>
								<p>6.8</p>
							</c>
							<c>
								<p>6.5</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>Reproduced with permission from <abbrgrp><abbr bid="B38">38</abbr></abbrgrp></p>
					</tblfn>
				</tbl>
				<tbl id="T4" hint_layout="single">
					<title>
						<p>Table 4</p>
					</title>
					<caption>
						<p>Accuracy of discriminating transmembrane segments from non-transmembrane segments in trans-membrane proteins using the SOGR-I and SOGR-IB classifiers, a decision tree classifier (C4.5), and a support vector machine classifier (SVM<sup>light</sup> version 6.01), based on ten-fold cross-validation. Two features were used, namely polarity (Grantham scale) and flexibility.</p>
					</caption>
					<tblbdy cols="6">
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c cspan="2">
								<p>C4.5</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p>Fold</p>
							</c>
							<c>
								<p>SOGR-I</p>
							</c>
							<c>
								<p>SOGR-IB</p>
							</c>
							<c>
								<p>Before Pruning</p>
							</c>
							<c>
								<p>After Pruning</p>
							</c>
							<c>
								<p>SVM</p>
							</c>
						</r>
						<r>
							<c cspan="6">
								<hr/>
							</c>
						</r>
						<r>
							<c>
								<p>1</p>
							</c>
							<c>
								<p>71.7541</p>
							</c>
							<c>
								<p>72.0721</p>
							</c>
							<c>
								<p>72.3900</p>
							</c>
							<c>
								<p>72.6020</p>
							</c>
							<c>
								<p>72.6550</p>
							</c>
						</r>
						<r>
							<c>
								<p>2</p>
							</c>
							<c>
								<p>65.1469</p>
							</c>
							<c>
								<p>65.8561</p>
							</c>
							<c>
								<p>66.1601</p>
							</c>
							<c>
								<p>66.1601</p>
							</c>
							<c>
								<p>67.0719</p>
							</c>
						</r>
						<r>
							<c>
								<p>3</p>
							</c>
							<c>
								<p>77.1277</p>
							</c>
							<c>
								<p>78.4043</p>
							</c>
							<c>
								<p>76.3830</p>
							</c>
							<c>
								<p>77.5532</p>
							</c>
							<c>
								<p>77.4468</p>
							</c>
						</r>
						<r>
							<c>
								<p>4</p>
							</c>
							<c>
								<p>83.0986</p>
							</c>
							<c>
								<p>85.0302</p>
							</c>
							<c>
								<p>83.7827</p>
							</c>
							<c>
								<p>83.7827</p>
							</c>
							<c>
								<p>83.0181</p>
							</c>
						</r>
						<r>
							<c>
								<p>5</p>
							</c>
							<c>
								<p>77.2502</p>
							</c>
							<c>
								<p>77.6631</p>
							</c>
							<c>
								<p>76.4244</p>
							</c>
							<c>
								<p>76.4244</p>
							</c>
							<c>
								<p>79.1082</p>
							</c>
						</r>
						<r>
							<c>
								<p>6</p>
							</c>
							<c>
								<p>81.9235</p>
							</c>
							<c>
								<p>83.2368</p>
							</c>
							<c>
								<p>82.8505</p>
							</c>
							<c>
								<p>82.8119</p>
							</c>
							<c>
								<p>82.1166</p>
							</c>
						</r>
						<r>
							<c>
								<p>7</p>
							</c>
							<c>
								<p>75.5720</p>
							</c>
							<c>
								<p>76.6052</p>
							</c>
							<c>
								<p>75.7934</p>
							</c>
							<c>
								<p>75.8672</p>
							</c>
							<c>
								<p>75.9410</p>
							</c>
						</r>
						<r>
							<c>
								<p>8</p>
							</c>
							<c>
								<p>79.4423</p>
							</c>
							<c>
								<p>79.4423</p>
							</c>
							<c>
								<p>79.7704</p>
							</c>
							<c>
								<p>79.4970</p>
							</c>
							<c>
								<p>79.4423</p>
							</c>
						</r>
						<r>
							<c>
								<p>9</p>
							</c>
							<c>
								<p>64.1365</p>
							</c>
							<c>
								<p>64.3454</p>
							</c>
							<c>
								<p>64.2061</p>
							</c>
							<c>
								<p>64.2061</p>
							</c>
							<c>
								<p>64.4150</p>
							</c>
						</r>
						<r>
							<c>
								<p>10</p>
							</c>
							<c>
								<p>67.4539</p>
							</c>
							<c>
								<p>67.5277</p>
							</c>
							<c>
								<p>67.0849</p>
							</c>
							<c>
								<p>67.0849</p>
							</c>
							<c>
								<p>67.0849</p>
							</c>
						</r>
						<r>
							<c>
								<p>Mean</p>
							</c>
							<c>
								<p>74.3</p>
							</c>
							<c>
								<p>75.0</p>
							</c>
							<c>
								<p>74.5</p>
							</c>
							<c>
								<p>74.6</p>
							</c>
							<c>
								<p>74.8</p>
							</c>
						</r>
						<r>
							<c>
								<p>Std. dev.</p>
							</c>
							<c>
								<p>6.8</p>
							</c>
							<c>
								<p>7.2</p>
							</c>
							<c>
								<p>6.9</p>
							</c>
							<c>
								<p>6.9</p>
							</c>
							<c>
								<p>6.7</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>Reproduced with permission from <abbrgrp><abbr bid="B38">38</abbr></abbrgrp></p>
					</tblfn>
				</tbl>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Conclusions</p>
			</st>
			<p>We determined that the most useful properties for discriminating transmembrane segments from non-transmembrane segments and for discriminating intrinsically unstructured segments from intrinsically structured segments in transmembrane proteins were hydropathy, polarity, and flexibility, and based on these properties, constructed a number of classifiers to identify transmembrane segments with an out-of-sample accuracy of approximately 75%. Several interesting observations emerged from our study:</p>
			<p>&#8226; Intrinsically unstructured segments and transmembrane segments tend to have opposite properties, as summarized in Table <tblr tid="T5">5</tblr>. For example, unstructured segments tended to have a low hydropathy value, whereas transmembrane segments tended to have a high hydropathy value. These results are in agreement with previous work that found that transmembrane segments tend to be more hydrophobic than non-transmembrane segments, due to the fact that transmembrane &#945;-helices require a stretch of 12-35 hydrophobic amino acids to span the hydrophobic region inside the membrane <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>.</p>
			<tbl id="T5" hint_layout="single">
				<title>
					<p>Table 5</p>
				</title>
				<caption>
					<p>Tendencies of various properties for tranmembrane (TM) and intrinsically unstructured (IU) segments.</p>
				</caption>
				<tblbdy cols="3">
					<r>
						<c>
							<p/>
						</c>
						<c>
							<p>Segment</p>
						</c>
						<c>
							<p>Type</p>
						</c>
					</r>
					<r>
						<c>
							<p>Property</p>
						</c>
						<c>
							<p>TM</p>
						</c>
						<c>
							<p>IU</p>
						</c>
					</r>
					<r>
						<c cspan="3">
							<hr/>
						</c>
					</r>
					<r>
						<c>
							<p>Hydropathy</p>
						</c>
						<c>
							<p>High</p>
						</c>
						<c>
							<p>Low</p>
						</c>
					</r>
					<r>
						<c>
							<p>Polarity</p>
						</c>
						<c>
							<p>Low</p>
						</c>
						<c>
							<p>High</p>
						</c>
					</r>
					<r>
						<c>
							<p>Bulkiness</p>
						</c>
						<c>
							<p>High</p>
						</c>
						<c>
							<p>Low</p>
						</c>
					</r>
					<r>
						<c>
							<p>Flexibility</p>
						</c>
						<c>
							<p>Low</p>
						</c>
						<c>
							<p>High</p>
						</c>
					</r>
					<r>
						<c>
							<p>Electronic effects</p>
						</c>
						<c>
							<p>High</p>
						</c>
						<c>
							<p>Low</p>
						</c>
					</r>
				</tblbdy>
				<tblfn>
					<p>Reproduced with permission from <abbrgrp><abbr bid="B38">38</abbr></abbrgrp></p>
				</tblfn>
			</tbl>
			<p>&#8226; Transmembrane proteins appear to be much richer in intrinsically unstructured segments than other proteins; about 70% of transmembrane proteins contain intrinsically unstructured regions, as compared to about 35% of other proteins.</p>
			<p>&#8226; In approximately 70% of transmembrane proteins that contain intrinsically unstructured segments, the intrinsically unstructured segments are close to transmembrane segments.</p>
			<p>These observations may provide insight into the structural and functional roles that intrinsically unstructured segments play in membrane proteins, and may also aid in the identification of intrinsically unstructured and transmembrane segments from primary protein structure.</p>
		</sec>
		<sec>
			<st>
				<p>Methods</p>
			</st>
			<sec>
				<st>
					<p>Physicochemical properties</p>
				</st>
				<p>The Overlap Ratio, a quantitative measure of how well two classes (referred to generically as &#8220;class 1&#8221; and &#8220;class 2&#8221;) can be discriminated based on a property <it>X</it>, was calculated as follows.</p>
				<p>1. We construct a graph such that:</p>
				<p>(a) The horizontal axis corresponds to the property X. We divide this axis into bins.</p>
				<p>(b) The y-value associated with the bin corresponding to X values between <it>x</it> and <it>x</it> + &#8712; is the fraction of all instances in the training set that belong to class 1 and have a value for the feature X in the range [<it>x</it>, <it>x</it> + &#8712;), where &#8712; &gt; 0 is small.</p>
				<p>The graph represents an approximation to the function P{class 1|<it>X</it> = <it>x</it>}. We define the complementary function P{class 2|<it>X</it> = <it>x</it>}using</p>
				<p>
					<display-formula>
						<m:math name="1471-2164-9-S1-S7-i2" xmlns:m="http://www.w3.org/1998/Math/MathML">
							<m:semantics>
								<m:mrow>
									<m:mi>P</m:mi>
									<m:mo>{</m:mo>
									<m:mtext>class&#160;</m:mtext>
									<m:mn>2</m:mn>
									<m:mo>|</m:mo>
									<m:mi>X</m:mi>
									<m:mo>=</m:mo>
									<m:mi>x</m:mi>
									<m:mo>}</m:mo>
									<m:mo>=</m:mo>
									<m:mn>1</m:mn>
									<m:mo>&#8722;</m:mo>
									<m:mi>P</m:mi>
									<m:mo>{</m:mo>
									<m:mtext>class&#160;</m:mtext>
									<m:mn>1</m:mn>
									<m:mo>|</m:mo>
									<m:mi>X</m:mi>
									<m:mo>=</m:mo>
									<m:mi>x</m:mi>
									<m:mo>}</m:mo>
								</m:mrow>
								<m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakeaajugqbiabdcfaqjabcUha7jabbogaJjabbYgaSjabbggaHjabbohaZjabbohaZjabbccaGiabikdaYiabcYha8jabdIfayjabg2da9iabdIha4jabc2ha9jabg2da9iabigdaXiabgkHiTiabdcfaqjabcUha7jabbogaJjabbYgaSjabbggaHjabbohaZjabbohaZjabbccaGiabigdaXiabcYha8jabdIfayjabg2da9iabdIha4jabc2ha9baa@649E@</m:annotation>
							</m:semantics>
						</m:math>
					</display-formula>
				</p>
				<p>2. Let</p>
				<p>
					<display-formula>
						<m:math name="1471-2164-9-S1-S7-i3" xmlns:m="http://www.w3.org/1998/Math/MathML">
							<m:semantics>
								<m:mtable columnalign="left">
									<m:mtr>
										<m:mtd>
											<m:msub>
												<m:mi>f</m:mi>
												<m:mn>1</m:mn>
											</m:msub>
											<m:mo stretchy="false">(</m:mo>
											<m:mi>x</m:mi>
											<m:mo stretchy="false">)</m:mo>
											<m:mo>&#8801;</m:mo>
											<m:mtext>P</m:mtext>
											<m:mo>{</m:mo>
											<m:mtext>class&#160;</m:mtext>
											<m:mn>1</m:mn>
											<m:mo>|</m:mo>
											<m:mi>X</m:mi>
											<m:mo>=</m:mo>
											<m:mi>x</m:mi>
											<m:mo>}</m:mo>
										</m:mtd>
									</m:mtr>
									<m:mtr>
										<m:mtd>
											<m:msub>
												<m:mi>f</m:mi>
												<m:mn>2</m:mn>
											</m:msub>
											<m:mo stretchy="false">(</m:mo>
											<m:mi>x</m:mi>
											<m:mo stretchy="false">)</m:mo>
											<m:mo>&#8801;</m:mo>
											<m:mtext>P</m:mtext>
											<m:mo>{</m:mo>
											<m:mtext>class&#160;2</m:mtext>
											<m:mo>|</m:mo>
											<m:mi>X</m:mi>
											<m:mo>=</m:mo>
											<m:mi>x</m:mi>
											<m:mo>}</m:mo>
										</m:mtd>
									</m:mtr>
								</m:mtable>
								<m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakqaabeqaaKqzafGaemOzaywcfa4aaSbaaSqaaKqzafGaeGymaedaleqaaKqzafGaeiikaGIaemiEaGNaeiykaKIaeyyyIORaeeiuaaLaei4EaSNaee4yamMaeeiBaWMaeeyyaeMaee4CamNaee4CamNaeeiiaaIaeGymaeJaeiiFaWNaemiwaGLaeyypa0JaemiEaGNaeiyFa0hakeaajugqbiabdAgaMLqbaoaaBaaaleaajugqbiabikdaYaWcbeaajugqbiabcIcaOiabdIha4jabcMcaPiabggMi6kabbcfaqjabcUha7jabbogaJjabbYgaSjabbggaHjabbohaZjabbohaZjabbccaGiabbkdaYiabcYha8jabdIfayjabg2da9iabdIha4jabc2ha9baaaa@752A@</m:annotation>
							</m:semantics>
						</m:math>
					</display-formula>
				</p>
				<p>Then the Overlap Ratio is then defined as:</p>
				<p>
					<display-formula>
						<m:math name="1471-2164-9-S1-S7-i4" xmlns:m="http://www.w3.org/1998/Math/MathML">
							<m:semantics>
								<m:mrow>
									<m:mtext>overlap&#160;Ratio</m:mtext>
									<m:mo>=</m:mo>
									<m:mfrac>
										<m:mrow>
											<m:mtext>Area&#160;under&#160;both&#160;</m:mtext>
											<m:msub>
												<m:mi>f</m:mi>
												<m:mn>1</m:mn>
											</m:msub>
											<m:mo stretchy="false">(</m:mo>
											<m:mi>x</m:mi>
											<m:mo stretchy="false">)</m:mo>
											<m:mtext>&#160;and&#160;</m:mtext>
											<m:msub>
												<m:mi>f</m:mi>
												<m:mn>2</m:mn>
											</m:msub>
											<m:mo stretchy="false">(</m:mo>
											<m:mi>x</m:mi>
											<m:mo stretchy="false">)</m:mo>
										</m:mrow>
										<m:mrow>
											<m:mtext>Area&#160;under&#160;&#160;</m:mtext>
											<m:msub>
												<m:mi>f</m:mi>
												<m:mn>1</m:mn>
											</m:msub>
											<m:mo stretchy="false">(</m:mo>
											<m:mi>x</m:mi>
											<m:mo stretchy="false">)</m:mo>
											<m:mo>+</m:mo>
											<m:mtext>&#160;Area&#160;under&#160;</m:mtext>
											<m:msub>
												<m:mi>f</m:mi>
												<m:mn>2</m:mn>
											</m:msub>
											<m:mo stretchy="false">(</m:mo>
											<m:mi>x</m:mi>
											<m:mo stretchy="false">)</m:mo>
										</m:mrow>
									</m:mfrac>
								</m:mrow>
								<m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakeaajugqbiabb+gaVjabbAha2jabbwgaLjabbkhaYjabbYgaSjabbggaHjabbchaWjabbccaGiabbkfasjabbggaHjabbsha0jabbMgaPjabb+gaVjabg2da9KqbaoaalaaakeaajugqbiabbgeabjabbkhaYjabbwgaLjabbggaHjabbccaGiabbwha1jabb6gaUjabbsgaKjabbwgaLjabbkhaYjabbccaGiabbkgaIjabb+gaVjabbsha0jabbIgaOjabbccaGiabdAgaMLqbaoaaBaaaleaajugqbiabigdaXaWcbeaajugqbiabcIcaOiabdIha4jabcMcaPiabbccaGiabbggaHjabb6gaUjabbsgaKjabbccaGiabdAgaMLqbaoaaBaaaleaajugqbiabikdaYaWcbeaajugqbiabcIcaOiabdIha4jabcMcaPaGcbaqcLbuacWaJagyqaeKamWiGbkhaYjadmcyGLbqzcWaJagyyaeMamWiGbccaGiadmcyG1bqDcWaJagOBa4MamWiGbsgaKjadmcyGLbqzcWaJagOCaiNaeeiiaaIaeeiiaaIaemOzaywcfa4aaSbaaSqaaKqzafGaeGymaedaleqaaKqzafGaeiikaGIaemiEaGNaeiykaKIaey4kaSIaeeiiaaIaeeyqaeKaeeOCaiNaeeyzauMaeeyyaeMaeeiiaaIaeeyDauNaeeOBa4MaeeizaqMaeeyzauMaeeOCaiNaeeiiaaIaemOzaywcfa4aaSbaaSqaaKqzafGaeGOmaidaleqaaKqzafGaeiikaGIaemiEaGNaeiykaKcaaaaa@B1CB@</m:annotation>
							</m:semantics>
						</m:math>
					</display-formula>
				</p>
				<p>The smaller the Overlap Ratio, the more easily the two classes can be discriminated.</p>
			</sec>
			<sec>
				<st>
					<p>The SOGR-I and SOGR-IB classification algorithms</p>
				</st>
				<sec>
					<st>
						<p>Overview</p>
					</st>
					<p>The Self-Organizing Global Ranking (SOGR) algorithm <abbrgrp><abbr bid="B37">37</abbr></abbrgrp> was inspired by Kohonen's Self-Organizing Map (SOM) algorithm <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>. In the SOM algorithm, each neuron has associated with it a topological neighborhood, and the algorithm is such that neighboring neurons in the topological space tend to arrange themselves over time into a grid in feature space that mimics the neighborhood structure in the topological space. The SOGR algorithm differs from the SOM algorithm by dropping the topological neighborhood and replacing it with the concept of a global neighborhood generated by ranking. We consider two variants of the SOGR algorithm:</p>
					<p>&#8226; The first variant, SOGR-I <abbrgrp><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr></abbrgrp>, modifies the initialization scheme of SOGR.</p>
					<p>&#8226; The second variant, SOGR-IB <abbrgrp><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr></abbrgrp> (&#8220;B&#8221; stands for &#8220;Batch update&#8221;), removes the dependence on the order in which instances are presented by only updating the weights after each cycle, where a cycle involves presenting the entire training set to the network, one instance at a time. This variant also uses the modified initialization procedure described above.</p>
					<p>Before we describe the above modifications in detail, we describe the SOGR algorithm itself.</p>
				</sec>
				<sec>
					<st>
						<p>The SOGR classification algorithm</p>
					</st>
					<p>We assume that <it>m</it> neurons are used; each neuron <it>j</it> has a weight vector <inline-formula><m:math name="1471-2164-9-S1-S7-i5" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>W</m:mi><m:mo>&#8594;</m:mo></m:mover><m:mi>j</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakeaajugqbiqbdEfaxzaalaqcfa4aaSbaaSqaaKqzafGaemOAaOgaleqaaaaa@4205@</m:annotation></m:semantics></m:math></inline-formula> (<it>t</it>), where <it>t</it> represents time. Let the initial position of neuron <it>j</it> at time <it>t</it> = 0 be <inline-formula><m:math name="1471-2164-9-S1-S7-i6" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>W</m:mi><m:mo>&#8594;</m:mo></m:mover><m:mi>j</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakeaajugqbiqbdEfaxzaalaqcfa4aaSbaaSqaaKqzafGaemOAaOgaleqaaaaa@4205@</m:annotation></m:semantics></m:math></inline-formula> (0), and assume that the training set consists of instances (<inline-formula><m:math name="1471-2164-9-S1-S7-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#8594;</m:mo></m:mover><m:mi>i</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakeaajugqbiqbdIha4zaalaqcfa4aaSbaaSqaaKqzafGaemyAaKgaleqaaaaa@4245@</m:annotation></m:semantics></m:math></inline-formula>, <it>y<sub>i</sub></it>), <it>i</it> = 1, &#8230; , <it>n</it>, where the <inline-formula><m:math name="1471-2164-9-S1-S7-i8" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#8594;</m:mo></m:mover><m:mi>i</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakeaajugqbiqbdIha4zaalaqcfa4aaSbaaSqaaKqzafGaemyAaKgaleqaaaaa@4245@</m:annotation></m:semantics></m:math></inline-formula> are feature vectors, and <it>y<sub>i</sub></it> denotes the class of an instance.</p>
					<p>1. <b>Initialization:</b> Choose initial positions <inline-formula><m:math name="1471-2164-9-S1-S7-i9" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>W</m:mi><m:mo>&#8594;</m:mo></m:mover><m:mi>j</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakeaajugqbiqbdEfaxzaalaqcfa4aaSbaaSqaaKqzafGaemOAaOgaleqaaaaa@4205@</m:annotation></m:semantics></m:math></inline-formula> (0) in feature space for the <it>m</it> neurons by assigning the neurons random positions in feature space.</p>
					<p>2. Present the instances in the training set to the network, one at a time. As each instance is presented to the network, the time index <it>t</it> is increased by 1. For each instance (<inline-formula><m:math name="1471-2164-9-S1-S7-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#8594;</m:mo></m:mover><m:mi>i</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakeaajugqbiqbdIha4zaalaqcfa4aaSbaaSqaaKqzafGaemyAaKgaleqaaaaa@4245@</m:annotation></m:semantics></m:math></inline-formula>, <it>y<sub>i</sub></it>) in the training set, the positions of one or more neurons are adjusted as follows:</p>
					<p>&#8226; <b>Identifying Winning Neurons:</b> Find the <it>R</it> closest neurons to the feature vector <inline-formula><m:math name="1471-2164-9-S1-S7-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#8594;</m:mo></m:mover><m:mi>i</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakeaajugqbiqbdIha4zaalaqcfa4aaSbaaSqaaKqzafGaemyAaKgaleqaaaaa@4245@</m:annotation></m:semantics></m:math></inline-formula>, that is, find the <it>R</it> neurons with the smallest value of <inline-formula><m:math name="1471-2164-9-S1-S7-i12" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mo>&#8741;</m:mo><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#8594;</m:mo></m:mover><m:mi>i</m:mi></m:msub><m:mtext>&#8201;</m:mtext><m:mo>&#8722;</m:mo><m:mtext>&#8201;</m:mtext><m:msub><m:mover accent="true"><m:mi>W</m:mi><m:mo>&#8594;</m:mo></m:mover><m:mi>j</m:mi></m:msub><m:mo stretchy="false">(</m:mo><m:mi>t</m:mi><m:mo stretchy="false">)</m:mo><m:mo>&#8741;</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXafv3ySLgzGmvETj2BSbqeeuuDJXwAKbsr4rNCHbGeaGqipu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaaeaabaWaaaGcbaqcLbuacqWILicucuWG4baEgaWcaKqbaoaaBaaaleaajugqbiabdMgaPbWcbeaajugqbiaaysW7cqGHsislcaaMe8Uafm4vaCLbaSaajuaGdaWgaaWcbaqcLbuacqWGQbGAaSqabaqcLbuacqGGOaakcqWG0baDcqGGPaqkcqWILicuaaa@43FA@</m:annotation></m:semantics></m:math></inline-formula>. These <it>R</it> neurons constitute the &#8220;neighborhood&#8221; of the input vector. Let &#915; be the set of indices of the <it>R</it> winning neurons.</p>
					<p>&#8226; <b>Updating Weights:</b> Adjust the positions of each of the <it>R</it> winning neurons using the update rule</p>
					<p>
						<display-formula>
							<m:math name="1471-2164-9-S1-S7-i13" xmlns:m="http://www.w3.org/1998/Math/MathML">
								<m:semantics>
									<m:mrow>
										<m:msub>
											<m:mover accent="true">
												<m:mi>W</m:mi>
												<m:mo>&#8594;</m:mo>
											</m:mover>
											<m:mi>j</m:mi>
										</m:msub>
										<m:mo stretchy="false">(</m:mo>
										<m:mi>t</m:mi>
										<m:mo>+</m:mo>
										<m:mn>1</m:mn>
										<m:mo stretchy="false">)</m:mo>
										<m:mo>=</m:mo>
										<m:msub>
											<m:mover accent="true">
												<m:mi>W</m:mi>
												<m:mo>&#8594;</m:mo>
											</m:mover>
											<m:mi>j</m:mi>
										</m:msub>
										<m:mo stretchy="false">(</m:mo>
										<m:mi>t</m:mi>
										<m:mo stretchy="false">)</m:mo>
										<m:mo>+</m:mo>
										<m:msub>
											<m:mi>&#951;</m:mi>
											<m:mi>t</m:mi>
										</m:msub>
										<m:mo stretchy="false">(</m:mo>
										<m:msub>
											<m:mover accent="true">
												<m:mi>x</m:mi>
												<m:mo>&#8594;</m:mo>
											</m:mover>
											<m:mi>i</m:mi>
										</m:msub>
										<m:mo>&#8722;</m:mo>
										<m:msub>
											<m:mover accent="true">
												<m:mi>W</m:mi>
												<m:mo>&#8594;</m:mo>
											</m:mover>
											<m:mi>j</m:mi>
										</m:msub>
										<m:mo stretchy="false">(</m:mo>
										<m:mi>t</m:mi>
										<m:mo stretchy="false">)</m:mo>
										<m:mo stretchy="false">)</m:mo>
									</m:mrow>
									<m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakeaajugqbiqbdEfaxzaalaqcfa4aaSbaaSqaaKqzafGaemOAaOgaleqaaKqzafGaeiikaGIaemiDaqNaey4kaSIaeGymaeJaeiykaKIaeyypa0Jafm4vaCLbaSaajuaGdaWgaaWcbaqcLbuacqWGQbGAaSqabaqcLbuacqGGOaakcqWG0baDcqGGPaqkcqGHRaWkcqaH3oaAjuaGdaWgaaWcbaqcLbuacqWG0baDaSqabaqcLbuacqGGOaakcuWG4baEgaWcaKqbaoaaBaaaleaajugqbiabdMgaPbWcbeaajugqbiabgkHiTiqbdEfaxzaalaqcfa4aaSbaaSqaaKqzafGaemOAaOgaleqaaKqzafGaeiikaGIaemiDaqNaeiykaKIaeiykaKcaaa@6651@</m:annotation>
								</m:semantics>
							</m:math>
						</display-formula>
					</p>
					<p>where <it>j</it> &#8712; &#915; and &#951;<sub><it>t</it></sub> is the learning rate. The learning rate is chosen to decrease with time in order to force convergence of the algorithm. In <abbrgrp><abbr bid="B37">37</abbr></abbrgrp> it is suggested that the learning rate be decreased at an exponential rate, and that it should be smaller for larger neighborhood sizes <it>R</it>.</p>
					<p>3. <b>Assigning Classes to Neurons:</b> Associated with each neuron <it>j</it> is a count of the number of instances belonging to each class that are closer to neuron <it>j</it> than any other neuron. This count is calculated as follows:</p>
					<p>&#8226; For each neuron, initialize the counts to zero.</p>
					<p>&#8226; For each instance (<inline-formula><m:math name="1471-2164-9-S1-S7-i14" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#8594;</m:mo></m:mover><m:mi>i</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakeaajugqbiqbdIha4zaalaqcfa4aaSbaaSqaaKqzafGaemyAaKgaleqaaaaa@4245@</m:annotation></m:semantics></m:math></inline-formula>, <it>y<sub>i</sub></it>) in the training set, find the closest neuron to the feature vector <inline-formula><m:math name="1471-2164-9-S1-S7-i15" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#8594;</m:mo></m:mover><m:mi>i</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakeaajugqbiqbdIha4zaalaqcfa4aaSbaaSqaaKqzafGaemyAaKgaleqaaaaa@4245@</m:annotation></m:semantics></m:math></inline-formula>, that is, find the neuron with the index <it>j</it><sup>*</sup>, where</p>
					<p>
						<display-formula>
							<m:math name="1471-2164-9-S1-S7-i16" xmlns:m="http://www.w3.org/1998/Math/MathML">
								<m:semantics>
									<m:mrow>
										<m:mi>j</m:mi>
										<m:mo>*</m:mo>
										<m:mo>=</m:mo>
										<m:mtext>arg&#160;</m:mtext>
										<m:munder>
											<m:mrow>
												<m:mi>min</m:mi>
												<m:mo>&#8289;</m:mo>
											</m:mrow>
											<m:mi>j</m:mi>
										</m:munder>
										<m:mo>|</m:mo>
										<m:mo>|</m:mo>
										<m:msub>
											<m:mover accent="true">
												<m:mi>x</m:mi>
												<m:mo>&#8594;</m:mo>
											</m:mover>
											<m:mi>i</m:mi>
										</m:msub>
										<m:mo>&#8722;</m:mo>
										<m:msub>
											<m:mover accent="true">
												<m:mi>W</m:mi>
												<m:mo>&#8594;</m:mo>
											</m:mover>
											<m:mi>j</m:mi>
										</m:msub>
										<m:mo stretchy="false">(</m:mo>
										<m:mi>t</m:mi>
										<m:mo stretchy="false">)</m:mo>
										<m:mo>|</m:mo>
										<m:mo>|</m:mo>
									</m:mrow>
									<m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakeaajugqbiabdQgaQjabcQcaQiabg2da9iabbggaHjabbkhaYjabbEgaNjabbccaGKqbaoaaxabakeaajugqbiGbc2gaTjabcMgaPjabc6gaUbWcbaqcLbuacqWGQbGAaSqabaqcLbuacqGG8baFcqGG8baFcuWG4baEgaWcaKqbaoaaBaaaleaajugqbiabdMgaPbWcbeaajugqbiabgkHiTiqbdEfaxzaalaqcfa4aaSbaaSqaaKqzafGaemOAaOgaleqaaKqzafGaeiikaGIaemiDaqNaeiykaKIaeiiFaWNaeiiFaWhaaa@6244@</m:annotation>
								</m:semantics>
							</m:math>
						</display-formula>
					</p>
					<p>and increment the count in neuron <it>j</it><sup>*</sup> corresponding to class <it>y<sub>i</sub></it> by 1.</p>
					<p>&#8226; After all instances in the training set have been considered, each neuron is assigned to the class corresponding to the largest count for that neuron.</p>
					<p>After the training process has been completed, a test instance can be classified by assigning it the class label of the nearest neuron.</p>
				</sec>
				<sec>
					<st>
						<p>The SOGR-I classification algorithm</p>
					</st>
					<p>The first variant, SOGR-I <abbrgrp><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr></abbrgrp>, modifies the initialization scheme of SOGR. Specifically, assume that the feature space is <it>d</it> dimensional, so that the feature vectors <inline-formula><m:math name="1471-2164-9-S1-S7-i17" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#8594;</m:mo></m:mover><m:mi>i</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakeaajugqbiqbdIha4zaalaqcfa4aaSbaaeaajugqbiabdMgaPbqcfayabaaaaa@42BD@</m:annotation></m:semantics></m:math></inline-formula> belong to <inline-formula><m:math name="1471-2164-9-S1-S7-i18" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msup><m:mi>&#8477;</m:mi><m:mi>d</m:mi></m:msup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakeaajugqbiabl2riHMqbaoaaCaaabeqaaKqzafGaemizaqgaaaaa@420B@</m:annotation></m:semantics></m:math></inline-formula>. For each feature <it>k</it>, we find the largest and smallest value of that feature over the entire training set, which are respectively <it>L<sub>k</sub></it> and <it>U<sub>k</sub></it>:</p>
					<p>
						<display-formula>
							<m:math name="1471-2164-9-S1-S7-i19" xmlns:m="http://www.w3.org/1998/Math/MathML">
								<m:semantics>
									<m:mtable columnalign="left">
										<m:mtr>
											<m:mtd>
												<m:msub>
													<m:mi>L</m:mi>
													<m:mi>k</m:mi>
												</m:msub>
												<m:mo>=</m:mo>
												<m:munder>
													<m:mrow>
														<m:mi>min</m:mi>
														<m:mo>&#8289;</m:mo>
													</m:mrow>
													<m:mi>i</m:mi>
												</m:munder>
												<m:msub>
													<m:mi>x</m:mi>
													<m:mrow>
														<m:mi>i</m:mi>
														<m:mi>k</m:mi>
													</m:mrow>
												</m:msub>
											</m:mtd>
										</m:mtr>
										<m:mtr>
											<m:mtd>
												<m:msub>
													<m:mi>U</m:mi>
													<m:mi>k</m:mi>
												</m:msub>
												<m:mo>=</m:mo>
												<m:munder>
													<m:mrow>
														<m:mi>min</m:mi>
														<m:mo>&#8289;</m:mo>
													</m:mrow>
													<m:mi>i</m:mi>
												</m:munder>
												<m:msub>
													<m:mi>x</m:mi>
													<m:mrow>
														<m:mi>i</m:mi>
														<m:mi>k</m:mi>
													</m:mrow>
												</m:msub>
											</m:mtd>
										</m:mtr>
									</m:mtable>
									<m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakqaabeqaaKqzafGaemitaWucfa4aaSbaaSqaaKqzafGaem4AaSgaleqaaKqzafGaeyypa0tcfa4aaCbeaOqaaKqzafGagiyBa0MaeiyAaKMaeiOBa4galeaajugqbiabdMgaPbWcbeaajugqbiabdIha4LqbaoaaBaaaleaajugqbiabdMgaPjabdUgaRbWcbeaaaOqaaKqzafGaemyvauvcfa4aaSbaaSqaaKqzafGaem4AaSgaleqaaKqzafGaeyypa0tcfa4aaCbeaOqaaKqzafGagiyBa0MaeiyAaKMaeiOBa4galeaajugqbiabdMgaPbWcbeaajugqbiabdIha4LqbaoaaBaaaleaajugqbiabdMgaPjabdUgaRbWcbeaaaaaa@6629@</m:annotation>
								</m:semantics>
							</m:math>
						</display-formula>
					</p>
					<p>where <it>x<sub>ik</sub></it> is the <it>k<sup>th</sup></it> element of the feature vector <inline-formula><m:math name="1471-2164-9-S1-S7-i20" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#8594;</m:mo></m:mover><m:mi>i</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakeaajugqbiqbdIha4zaalaqcfa4aaSbaaSqaaKqzafGaemyAaKgaleqaaaaa@4245@</m:annotation></m:semantics></m:math></inline-formula>. Then the initial positions of the <it>m</it> neurons are chosen as:</p>
					<p>
						<display-formula>
							<m:math name="1471-2164-9-S1-S7-i21" xmlns:m="http://www.w3.org/1998/Math/MathML">
								<m:semantics>
									<m:mrow>
										<m:msub>
											<m:mi>W</m:mi>
											<m:mrow>
												<m:mi>j</m:mi>
												<m:mi>k</m:mi>
											</m:mrow>
										</m:msub>
										<m:mo stretchy="false">(</m:mo>
										<m:mn>0</m:mn>
										<m:mo stretchy="false">)</m:mo>
										<m:mo>=</m:mo>
										<m:msub>
											<m:mi>L</m:mi>
											<m:mi>k</m:mi>
										</m:msub>
										<m:mo>+</m:mo>
										<m:mfrac>
											<m:mrow>
												<m:mi>j</m:mi>
												<m:mo>&#8722;</m:mo>
												<m:mn>1</m:mn>
											</m:mrow>
											<m:mrow>
												<m:mi>m</m:mi>
												<m:mo>&#8722;</m:mo>
												<m:mn>1</m:mn>
											</m:mrow>
										</m:mfrac>
										<m:mo stretchy="false">(</m:mo>
										<m:msub>
											<m:mi>U</m:mi>
											<m:mi>k</m:mi>
										</m:msub>
										<m:mo>&#8722;</m:mo>
										<m:msub>
											<m:mi>L</m:mi>
											<m:mi>k</m:mi>
										</m:msub>
										<m:mo stretchy="false">)</m:mo>
										<m:mtext>&#8195;</m:mtext>
										<m:mtable>
											<m:mtr>
												<m:mtd>
													<m:mrow>
														<m:mi>j</m:mi>
														<m:mo>=</m:mo>
													</m:mrow>
												</m:mtd>
												<m:mtd>
													<m:mrow>
														<m:mn>1</m:mn>
														<m:mo>,</m:mo>
														<m:mo>&#8230;</m:mo>
														<m:mo>,</m:mo>
														<m:mi>m</m:mi>
													</m:mrow>
												</m:mtd>
											</m:mtr>
											<m:mtr>
												<m:mtd>
													<m:mrow>
														<m:mi>k</m:mi>
														<m:mo>=</m:mo>
													</m:mrow>
												</m:mtd>
												<m:mtd>
													<m:mrow>
														<m:mn>1</m:mn>
														<m:mo>,</m:mo>
														<m:mo>&#8230;</m:mo>
														<m:mo>,</m:mo>
														<m:mi>d</m:mi>
													</m:mrow>
												</m:mtd>
											</m:mtr>
										</m:mtable>
									</m:mrow>
									<m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakeaajugqbiabdEfaxLqbaoaaBaaaleaajugqbiabdQgaQjabdUgaRbWcbeaajugqbiabcIcaOiabicdaWiabcMcaPiabg2da9iabdYeamLqbaoaaBaaaleaajugqbiabdUgaRbWcbeaajugqbiabgUcaRKqbaoaalaaakeaajugqbiabdQgaQjabgkHiTiabigdaXaGcbaqcLbuacqWGTbqBcqGHsislcqaIXaqmaaGaeiikaGIaemyvauvcfa4aaSbaaSqaaKqzafGaem4AaSgaleqaaKqzafGaeyOeI0IaemitaWucfa4aaSbaaSqaaKqzafGaem4AaSgaleqaaKqzafGaeiykaKIaaGzbVxaabeqaciaaaOqaaKqzafGaemOAaOMaeyypa0dakeaajugqbiabigdaXiabcYcaSiablAciljabcYcaSiabd2gaTbGcbaqcLbuacqWGRbWAcqGH9aqpaOqaaKqzafGaeGymaeJaeiilaWIaeSOjGSKaeiilaWIaemizaqgaaaaa@754F@</m:annotation>
								</m:semantics>
							</m:math>
						</display-formula>
					</p>
					<p>Thus the <it>m</it> neurons are evenly distributed along the line connecting (<it>L</it><sub>1</sub>, <it>L</it><sub>2</sub>, &#8230; <it>L<sub>d</sub></it>) to (<it>U</it><sub>1</sub>, <it>U</it><sub>2</sub>, &#8230; <it>U<sub>d</sub></it>). This approach has several advantages over other initialization methods:</p>
					<p>&#8226; It guarantees that the neurons will be in some sense evenly distributed throughout the feature space. Random initialization, on the other hand, does not guarantee this. If one has a large feature space, say of 60 dimensions, and comparatively few neurons, say 50, then with random initialization those neurons will with high probability not be evenly distributed throughout the feature space.</p>
					<p>&#8226; Even a small number of neurons can be used to populate the feature space. If we consider an alternate initialization procedure in which one populates the feature space with a d-dimensional grid of neurons, and there are <it>q</it> grid points along each feature space axis, then the total number of neurons required to populate this grid is <it>q<sup>d</sup></it>. For example, if <it>q</it> = 3 and the feature space has 60 dimensions, then the number of neurons required is</p>
					<p>
						<display-formula>
							<m:math name="1471-2164-9-S1-S7-i22" xmlns:m="http://www.w3.org/1998/Math/MathML">
								<m:semantics>
									<m:mrow>
										<m:msup>
											<m:mi>q</m:mi>
											<m:mi>d</m:mi>
										</m:msup>
										<m:mo>=</m:mo>
										<m:msup>
											<m:mn>3</m:mn>
											<m:mrow>
												<m:mn>60</m:mn>
											</m:mrow>
										</m:msup>
										<m:mo>&#8776;</m:mo>
										<m:mn>4.239</m:mn>
										<m:mo>&#215;</m:mo>
										<m:msup>
											<m:mrow>
												<m:mn>10</m:mn>
											</m:mrow>
											<m:mrow>
												<m:mn>28</m:mn>
											</m:mrow>
										</m:msup>
									</m:mrow>
									<m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakeaajugqbiabdghaXLqbaoaaCaaaleqabaqcLbuacqWGKbazaaGaeyypa0JaeG4mamtcfa4aaWbaaSqabeaajugqbiabiAda2iabicdaWaaacqGHijYUcqaI0aancqGGUaGlcqaIYaGmcqaIZaWmcqaI5aqocqGHxdaTcqaIXaqmcqaIWaamjuaGdaahaaWcbeqaaKqzafGaeGOmaiJaeGioaGdaaaaa@551D@</m:annotation>
								</m:semantics>
							</m:math>
						</display-formula>
					</p>
					<p>which is clearly infeasible.</p>
				</sec>
				<sec>
					<st>
						<p>The SOGR-IB classification algorithm</p>
					</st>
					<p>The second variant, SOGR-IB <abbrgrp><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr></abbrgrp>, addresses two problems with the original SOGR algorithm:</p>
					<p>&#8226; The SOGR algorithm updates the weights after each new instance is presented to the network; as a result, the neuron trajectories can oscillate wildly.</p>
					<p>&#8226; The SOGR algorithm specifies that the learning rate should be decreased during the course of training, for example at an exponential rate. The problem is that if the learning rate is decreased too rapidly, then the neurons may get stuck before they have reached their optimal positions.</p>
					<p>SOGR-IB (&#8220;B&#8221; stands for &#8220;Batch update&#8221;) addresses these problems in two ways:</p>
					<p>&#8226; It uses a &#8220;batch update&#8221; strategy for updating the positions of the neurons in feature space. This eliminates the dependence of the results on the order in which instances are presented to the network, and also stabilizes the trajectories of the neurons.</p>
					<p>&#8226; The batch update strategy allows the use of a fixed, but small, learning rate &#951;<sub><it>t</it></sub>, which eliminates the problem of the weights getting stuck because the learning rate &#951;<sub><it>t</it></sub> was decreased too quickly.</p>
					<p>The SOGR-IB algorithm is described below:</p>
					<p>1. <b>Initialization:</b> Choose initial positions <inline-formula><m:math name="1471-2164-9-S1-S7-i23" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>W</m:mi><m:mo>&#8594;</m:mo></m:mover><m:mi>j</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXafv3ySLgzGmvETj2BSbqeeuuDJXwAKbsr4rNCHbGeaGqipu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaaeaabaWaaaGcbaqcLbuacuWGxbWvgaWcaKqbaoaaBaaaleaajugqbiabdQgaQbWcbeaaaaa@34C8@</m:annotation></m:semantics></m:math></inline-formula>(0) in feature space for the <it>m</it> neurons using the SOGR-I initialization strategy. Set <it>t</it> = 0.</p>
					<p>2. Repeat the following until the &#8220;energy&#8221; defined by</p>
					<p>
						<display-formula>
							<m:math name="1471-2164-9-S1-S7-i24" xmlns:m="http://www.w3.org/1998/Math/MathML">
								<m:semantics>
									<m:mrow>
										<m:mi>Q</m:mi>
										<m:mo stretchy="false">(</m:mo>
										<m:mi>t</m:mi>
										<m:mo stretchy="false">)</m:mo>
										<m:mtext>&#8201;</m:mtext>
										<m:mo>=</m:mo>
										<m:mtext>&#8201;</m:mtext>
										<m:mfrac>
											<m:mn>1</m:mn>
											<m:mrow>
												<m:mn>2</m:mn>
												<m:mi>n</m:mi>
												<m:mi>R</m:mi>
											</m:mrow>
										</m:mfrac>
										<m:mtext>&#8201;</m:mtext>
										<m:mstyle displaystyle="true">
											<m:munderover>
												<m:mo>&#8721;</m:mo>
												<m:mrow>
													<m:mtext>instances&#160;</m:mtext>
													<m:mi>i</m:mi>
												</m:mrow>
												<m:mrow/>
											</m:munderover>
											<m:mrow/>
										</m:mstyle>
										<m:mtext>&#8201;</m:mtext>
										<m:mstyle displaystyle="true">
											<m:munderover>
												<m:mo>&#8721;</m:mo>
												<m:mrow>
													<m:mtext>neurons&#160;</m:mtext>
													<m:mi>j</m:mi>
												</m:mrow>
												<m:mrow/>
											</m:munderover>
											<m:mrow/>
										</m:mstyle>
										<m:mtext>&#8201;</m:mtext>
										<m:msub>
											<m:mi>m</m:mi>
											<m:mrow>
												<m:mi>i</m:mi>
												<m:mi>j</m:mi>
											</m:mrow>
										</m:msub>
										<m:mo>&#8741;</m:mo>
										<m:msub>
											<m:mover accent="true">
												<m:mi>x</m:mi>
												<m:mo>&#8594;</m:mo>
											</m:mover>
											<m:mi>i</m:mi>
										</m:msub>
										<m:mtext>&#8201;</m:mtext>
										<m:mo>&#8722;</m:mo>
										<m:mtext>&#8201;</m:mtext>
										<m:msub>
											<m:mover accent="true">
												<m:mi>W</m:mi>
												<m:mo>&#8594;</m:mo>
											</m:mover>
											<m:mi>j</m:mi>
										</m:msub>
										<m:mo stretchy="false">(</m:mo>
										<m:mi>t</m:mi>
										<m:mo stretchy="false">)</m:mo>
										<m:msup>
											<m:mo>&#8741;</m:mo>
											<m:mn>2</m:mn>
										</m:msup>
									</m:mrow>
									<m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakeaajugqbiabdgfarjabcIcaOiabdsha0jabcMcaPiaaysW7cqGH9aqpcaaMe8Ecfa4aaSaaaOqaaKqzafGaeGymaedakeaajugqbiabikdaYiabd6gaUjabdkfasbaacaaMe8Ecfa4aaabCaOqaaaWcbaqcLbuacqqGPbqAcqqGUbGBcqqGZbWCcqqG0baDcqqGHbqycqqGUbGBcqqGJbWycqqGLbqzcqqGZbWCcqqGGaaicqWGPbqAaSqaaaqcLbuacqGHris5aiaaysW7juaGdaaeWbGcbaaaleaajugqbiabb6gaUjabbwgaLjabbwha1jabbkhaYjabb+gaVjabb6gaUjabbohaZjabbccaGiabdQgaQbWcbaaajugqbiabggHiLdGaaGjbVlabd2gaTLqbaoaaBaaaleaajugqbiabdMgaPjabdQgaQbWcbeaajugqbiablwIiqjqbdIha4zaalaqcfa4aaSbaaSqaaKqzafGaemyAaKgaleqaaKqzafGaaGjbVlabgkHiTiaaysW7cuWGxbWvgaWcaKqbaoaaBaaaleaajugqbiabdQgaQbWcbeaajugqbiabcIcaOiabdsha0jabcMcaPiablwIiqLqbaoaaCaaaleqabaqcLbuacqaIYaGmaaaaaa@902F@</m:annotation>
								</m:semantics>
							</m:math>
						</display-formula>
					</p>
					<p>does not reach a new minimum over a number of iterations through the training set, where <it>n</it> is the number of training instances, <it>R</it> is the number of neurons neighboring a given training instance that will be updated, and for each instance (<inline-formula><m:math name="1471-2164-9-S1-S7-i25" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#8594;</m:mo></m:mover><m:mi>i</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakeaajugqbiqbdIha4zaalaqcfa4aaSbaaSqaaKqzafGaemyAaKgaleqaaaaa@4245@</m:annotation></m:semantics></m:math></inline-formula>, <it>y<sub>i</sub></it>) in the training set, <it>m<sub>ij</sub></it> = 1 for neurons <it>j</it> that are one of the R closest neurons to the feature vector <inline-formula><m:math name="1471-2164-9-S1-S7-i26" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#8594;</m:mo></m:mover><m:mi>i</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakeaajugqbiqbdIha4zaalaqcfa4aaSbaaSqaaKqzafGaemyAaKgaleqaaaaa@4245@</m:annotation></m:semantics></m:math></inline-formula>, and <it>m<sub>ij</sub></it> = 0 for all other neurons <it>j</it>. After each pass through the training set, the time index <it>t</it> is incremented by 1.</p>
					<p>(a) Let <inline-formula><m:math name="1471-2164-9-S1-S7-i27" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>Z</m:mi><m:mo>&#8594;</m:mo></m:mover><m:mi>j</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakeaajugqbiqbdQfaAzaalaqcfa4aaSbaaSqaaKqzafGaemOAaOgaleqaaaaa@420B@</m:annotation></m:semantics></m:math></inline-formula> be the &#8220;accumulator&#8221; corresponding to neuron <it>j</it>. Initialize <inline-formula><m:math name="1471-2164-9-S1-S7-i28" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>Z</m:mi><m:mo>&#8594;</m:mo></m:mover><m:mi>j</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakeaajugqbiqbdQfaAzaalaqcfa4aaSbaaSqaaKqzafGaemOAaOgaleqaaaaa@420B@</m:annotation></m:semantics></m:math></inline-formula> to 0 for all neurons <it>j</it>.</p>
					<p>(b) Present the instances (<inline-formula><m:math name="1471-2164-9-S1-S7-i29" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#8594;</m:mo></m:mover><m:mi>i</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakeaajugqbiqbdIha4zaalaqcfa4aaSbaaSqaaKqzafGaemyAaKgaleqaaaaa@4245@</m:annotation></m:semantics></m:math></inline-formula>, <it>y<sub>i</sub></it>) in the training set to the network, one at a time. After each instance is presented, the &#8220;accumulators&#8221; are updated as follows:</p>
					<p>&#8226; <b>Identifying Winning Neurons:</b> Find the <it>R</it> closest neurons to the feature vector <inline-formula><m:math name="1471-2164-9-S1-S7-i30" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#8594;</m:mo></m:mover><m:mi>i</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakeaajugqbiqbdIha4zaalaqcfa4aaSbaaSqaaKqzafGaemyAaKgaleqaaaaa@4245@</m:annotation></m:semantics></m:math></inline-formula>, that is, find the <it>R</it> neurons with the smallest value of <inline-formula><m:math name="1471-2164-9-S1-S7-i31" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mo>&#8741;</m:mo><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#8594;</m:mo></m:mover><m:mi>i</m:mi></m:msub><m:mtext>&#8201;</m:mtext><m:mo>&#8722;</m:mo><m:mtext>&#8201;</m:mtext><m:msub><m:mover accent="true"><m:mi>W</m:mi><m:mo>&#8594;</m:mo></m:mover><m:mi>j</m:mi></m:msub><m:mo stretchy="false">(</m:mo><m:mi>t</m:mi><m:mo stretchy="false">)</m:mo><m:mo>&#8741;</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXafv3ySLgzGmvETj2BSbqeeuuDJXwAKbsr4rNCHbGeaGqipu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaaeaabaWaaaGcbaqcLbuacqWILicucuWG4baEgaWcaKqbaoaaBaaaleaajugqbiabdMgaPbWcbeaajugqbiaaysW7cqGHsislcaaMe8Uafm4vaCLbaSaajuaGdaWgaaWcbaqcLbuacqWGQbGAaSqabaqcLbuacqGGOaakcqWG0baDcqGGPaqkcqWILicuaaa@43FA@</m:annotation></m:semantics></m:math></inline-formula>. These <it>R</it> neurons constitute the &#8220;neighborhood&#8221; of the input vector. Let &#915; be the set of indices of the <it>R</it> winning neurons.</p>
					<p>&#8226; <b>Updating Accumulators:</b> Adjust the accumulators corresponding to each of the <it>R</it> closest neurons using the update rule</p>
					<p>
						<display-formula>
							<m:math name="1471-2164-9-S1-S7-i33" xmlns:m="http://www.w3.org/1998/Math/MathML">
								<m:semantics>
									<m:mrow>
										<m:msub>
											<m:mover accent="true">
												<m:mi>Z</m:mi>
												<m:mo>&#8594;</m:mo>
											</m:mover>
											<m:mi>j</m:mi>
										</m:msub>
										<m:mtext>&#8201;</m:mtext>
										<m:mo>=</m:mo>
										<m:mtext>&#8201;</m:mtext>
										<m:msub>
											<m:mover accent="true">
												<m:mi>Z</m:mi>
												<m:mo>&#8594;</m:mo>
											</m:mover>
											<m:mi>j</m:mi>
										</m:msub>
										<m:mtext>&#8201;</m:mtext>
										<m:mo>+</m:mo>
										<m:mtext>&#8201;</m:mtext>
										<m:mfrac>
											<m:mn>1</m:mn>
											<m:mrow>
												<m:mi>n</m:mi>
												<m:mi>R</m:mi>
											</m:mrow>
										</m:mfrac>
										<m:msub>
											<m:mi>&#951;</m:mi>
											<m:mi>t</m:mi>
										</m:msub>
										<m:mo stretchy="false">(</m:mo>
										<m:msub>
											<m:mover accent="true">
												<m:mi>x</m:mi>
												<m:mo>&#8594;</m:mo>
											</m:mover>
											<m:mi>i</m:mi>
										</m:msub>
										<m:mo>&#8722;</m:mo>
										<m:mtext>&#8201;</m:mtext>
										<m:mover accent="true">
											<m:mi>W</m:mi>
											<m:mo>&#8594;</m:mo>
										</m:mover>
										<m:mo stretchy="false">(</m:mo>
										<m:mi>t</m:mi>
										<m:mo stretchy="false">)</m:mo>
										<m:mo stretchy="false">)</m:mo>
									</m:mrow>
									<m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakeaajugqbiqbdQfaAzaalaqcfa4aaSbaaSqaaKqzafGaemOAaOgaleqaaKqzafGaaGjbVlabg2da9iaaysW7cuWGAbGwgaWcaKqbaoaaBaaaleaajugqbiabdQgaQbWcbeaajugqbiaaysW7cqGHRaWkcaaMe8Ecfa4aaSaaaOqaaKqzafGaeGymaedakeaajugqbiabd6gaUjabdkfasbaacqaH3oaAjuaGdaWgaaWcbaqcLbuacqWG0baDaSqabaqcLbuacqGGOaakcuWG4baEgaWcaKqbaoaaBaaaleaajugqbiabdMgaPbWcbeaajugqbiabgkHiTiaaysW7cuWGxbWvgaWcaiabcIcaOiabdsha0jabcMcaPiabcMcaPaaa@6818@</m:annotation>
								</m:semantics>
							</m:math>
						</display-formula>
					</p>
					<p>where <it>j</it> &#8712; &#915; and &#951;<sub><it>t</it></sub> is the learning rate.</p>
					<p>(c). <b>Updating Neurons:</b> After all instances in the training set have been presented to the network, update the weights for each neuron <it>j</it> using the rule:</p>
					<p>
						<display-formula>
							<m:math name="1471-2164-9-S1-S7-i32" xmlns:m="http://www.w3.org/1998/Math/MathML">
								<m:semantics>
									<m:mrow>
										<m:msub>
											<m:mrow>
												<m:mover>
													<m:mi>W</m:mi>
													<m:mo>&#8594;</m:mo>
												</m:mover>
											</m:mrow>
											<m:mi>j</m:mi>
										</m:msub>
										<m:mrow>
											<m:mo>(</m:mo>
											<m:mrow>
												<m:mi>t</m:mi>
												<m:mo>+</m:mo>
												<m:mn>1</m:mn>
											</m:mrow>
											<m:mo>)</m:mo>
										</m:mrow>
										<m:mo>=</m:mo>
										<m:msub>
											<m:mrow>
												<m:mover>
													<m:mi>W</m:mi>
													<m:mo>&#8594;</m:mo>
												</m:mover>
											</m:mrow>
											<m:mi>j</m:mi>
										</m:msub>
										<m:mrow>
											<m:mo>(</m:mo>
											<m:mi>t</m:mi>
											<m:mo>)</m:mo>
										</m:mrow>
										<m:mo>+</m:mo>
										<m:msub>
											<m:mrow>
												<m:mover>
													<m:mi>Z</m:mi>
													<m:mo>&#8594;</m:mo>
												</m:mover>
											</m:mrow>
											<m:mi>j</m:mi>
										</m:msub>
									</m:mrow>
									<m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aqatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegm0B1jxALjhiov2DaeHbuLwBLnhiov2DGi1BTfMBaebbnrfifHhDYfgasaacH8qrps0lbbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaqaafaaakeaadaWfGaqaaiabdEfaxbWcbeqaaiabgkziUcaakmaaBaaaleaacqWGQbGAaeqaaOWaaeWaaeaacqWG0baDcqGHRaWkcqaIXaqmaiaawIcacaGLPaaacqGH9aqpdaWfGaqaaiabdEfaxbWcbeqaaiabgkziUcaakmaaBaaaleaacqWGQbGAaeqaaOWaaeWaaeaacqWG0baDaiaawIcacaGLPaaacqGHRaWkdaWfGaqaaiabdQfaAbWcbeqaaiabgkziUcaakmaaBaaaleaacqWGQbGAaeqaaaaa@5604@</m:annotation>
								</m:semantics>
							</m:math>
						</display-formula>
					</p>
					<p>where <it>n</it> is the number of instances in the training set.</p>
					<p>3. <b>Assigning Classes to Neurons:</b> Same as Step 3 in the SOGR algorithm above.</p>
				</sec>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Competing interests</p>
			</st>
			<p>The authors declare that they have no competing interests.</p>
		</sec>
		<sec>
			<st>
				<p>Authors' contributions</p>
			</st>
			<p>JYY conceived of the project; MQY and JYY contributed ideas to the project; MQY designed the project; MQY performed the experiments and analyses, and wrote the manuscript; AKD, YPD and XH contributed suggestions.</p>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>We are indebted to Dr. Okan K. Ersoy of Purdue University, and Dr. Albert Overhauser of Purdue University for helpful discussions. Dr. Craig W. Codrington contributed ideas to the project, helped MQY perform the experiments and analyses, and helped MQY write the manuscript.</p>
				<p>This article has been published as part of <it>BMC Genomics</it> Volume 9 Supplement 1, 2008: The 2007 International Conference on Bioinformatics &amp; Computational Biology (BIOCOMP'07). The full contents of the supplement are available online at <url>http://www.biomedcentral.com/1471-2164/9?issue=S1</url>.</p>
			</sec>
		</ack>
		<refgrp>
			<bibl id="B1">
				<title>
					<p>Intracellular signaling from the endoplasmic reticulum to the nucleus</p>
				</title>
				<aug>
					<au>
						<snm>Chapman</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Sidrauski</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Walter</snm>
						<fnm>P</fnm>
					</au>
				</aug>
				<source>Annu Rev Cell Dev Biol</source>
				<pubdate>1998</pubdate>
				<volume>14</volume>
				<fpage>459</fpage>
				<lpage>485</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1146/annurev.cellbio.14.1.459</pubid>
						<pubid idtype="pmpid" link="fulltext">9891790</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B2">
				<title>
					<p>State-of-the-art in membrane protein prediction</p>
				</title>
				<aug>
					<au>
						<snm>Chen</snm>
						<fnm>CP</fnm>
					</au>
					<au>
						<snm>Rost</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>Appl Bioinformatics</source>
				<pubdate>2002</pubdate>
				<volume>1</volume>
				<fpage>21</fpage>
				<lpage>35</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmpid">15130854</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B3">
				<title>
					<p>Mitochondrial proteins at unexpected cellular locations: export of proteins from mitochondria from an evolutionary perspective</p>
				</title>
				<aug>
					<au>
						<snm>Soltys</snm>
						<fnm>BJ</fnm>
					</au>
					<au>
						<snm>Gupta</snm>
						<fnm>RS</fnm>
					</au>
				</aug>
				<source>Int Rev Cytol</source>
				<pubdate>2000</pubdate>
				<volume>194</volume>
				<fpage>133</fpage>
				<lpage>196</lpage>
				<xrefbib>
					<pubid idtype="pmpid">10494626</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B4">
				<title>
					<p>Receptors and G proteins as primary components of transmembrane signal transduction. Part 1. G-protein-coupled receptors: structure and function</p>
				</title>
				<aug>
					<au>
						<snm>Gudermann</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Nurnberg</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Schultz</snm>
						<fnm>G</fnm>
					</au>
				</aug>
				<source>J Mol Med</source>
				<pubdate>1995</pubdate>
				<volume>73</volume>
				<fpage>51</fpage>
				<lpage>63</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1007/BF00270578</pubid>
						<pubid idtype="pmpid">7627630</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B5">
				<title>
					<p>Therapeutic potential of anti-IgE antibodies</p>
				</title>
				<aug>
					<au>
						<snm>Heusser</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Jardieu</snm>
						<fnm>P</fnm>
					</au>
				</aug>
				<source>Curr. Opin. Immunol</source>
				<pubdate>1997</pubdate>
				<volume>9</volume>
				<fpage>805</fpage>
				<lpage>813</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0952-7915(97)80182-3</pubid>
						<pubid idtype="pmpid" link="fulltext">9492982</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B6">
				<title>
					<p>Development of pharmacological agents for targeting neurotrophins and their receptors</p>
				</title>
				<aug>
					<au>
						<snm>Saragovi</snm>
						<fnm>HU</fnm>
					</au>
					<au>
						<snm>Gehring</snm>
						<fnm>K</fnm>
					</au>
				</aug>
				<source>Trends Pharmacol Sci</source>
				<pubdate>2000</pubdate>
				<volume>21</volume>
				<fpage>93</fpage>
				<lpage>98</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0165-6147(99)01444-3</pubid>
						<pubid idtype="pmpid" link="fulltext">10689362</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B7">
				<title>
					<p>Evaluation of methods for predicting the topology of &#946;-barrel outer membrane proteins and a consensus prediction method</p>
				</title>
				<aug>
					<au>
						<snm>Bagos</snm>
						<fnm>PG</fnm>
					</au>
					<au>
						<snm>Liakopoulos</snm>
						<fnm>TD</fnm>
					</au>
					<au>
						<snm>Hamodrakas</snm>
						<fnm>SJ</fnm>
					</au>
				</aug>
				<source>BMC Bioinformatics</source>
				<pubdate>2005</pubdate>
				<volume>6</volume>
				<issue/>
				<fpage>7</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">545999</pubid>
						<pubid idtype="pmpid" link="fulltext">15647112</pubid>
						<pubid idtype="doi">10.1186/1471-2105-6-7</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B8">
				<title>
					<p>Protein family classification with discriminant function analysis</p>
				</title>
				<aug>
					<au>
						<snm>Moriyama</snm>
						<fnm>EN</fnm>
					</au>
					<au>
						<snm>Kim</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Genome Exploitation: Data Mining the Genome</source>
				<publisher>New York: Springer</publisher>
				<editor>Edited by Gustafson JP, Shoemaker R, Snape JW</editor>
				<pubdate>2005</pubdate>
			</bibl>
			<bibl id="B9">
				<title>
					<p>Knowledge Acquisition from Amino Acid Sequences by Machine Learning System BONSAI</p>
				</title>
				<aug>
					<au>
						<snm>Shimozono</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Shinohara</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Shinohara</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Miyano</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Kuhara</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Arikawa</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Trans. Information Processing Society of Japan</source>
				<pubdate>1994</pubdate>
				<volume>35</volume>
				<fpage>2009</fpage>
				<lpage>2018</lpage>
				<note>[<url>http://citeseer.ist.psu.edu/108119.html</url>]</note>
			</bibl>
			<bibl id="B10">
				<title>
					<p>A predictor of transmembrane&#945;-helix domains of proteins based on neural networks</p>
				</title>
				<aug>
					<au>
						<snm>Casadio</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Fariselli</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Taroni</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Compiani</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>European Biophysics Journal</source>
				<pubdate>1996</pubdate>
				<volume>24</volume>
				<issue>3</issue>
				<fpage>165</fpage>
				<lpage>178</lpage>
				<note>[<url>http://dx.doi.org/10.1007/BF00180274</url>]</note>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1007/BF00180274</pubid>
						<pubid idtype="pmpid" link="fulltext">8852561</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B11">
				<title>
					<p>Analysis of protein transmembrane helical regions by a neural network</p>
				</title>
				<aug>
					<au>
						<snm>Dombi</snm>
						<fnm>GW</fnm>
					</au>
					<au>
						<snm>Lawrence</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Protein Science</source>
				<pubdate>1994</pubdate>
				<volume>3</volume>
				<issue>4</issue>
				<fpage>557</fpage>
				<lpage>566</lpage>
				<note>[<url>http://www.proteinscience.org/cgi/content/abstract/3/4/557</url>]</note>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">2142860</pubid>
						<pubid idtype="pmpid" link="fulltext">8003974</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B12">
				<title>
					<p>A neural network model for the prediction of membrane-spanning amino acid sequences</p>
				</title>
				<aug>
					<au>
						<snm>Lohmann</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Schneider</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Behrens</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Wrede</snm>
						<fnm>P</fnm>
					</au>
				</aug>
				<source>Protein Science</source>
				<pubdate>1994</pubdate>
				<volume>3</volume>
				<issue>9</issue>
				<fpage>1597</fpage>
				<lpage>1601</lpage>
				<note>[<url>http://www.proteinscience.org/cgi/content/abstract/3/9/1597</url>]</note>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">2142934</pubid>
						<pubid idtype="pmpid" link="fulltext">7833818</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B13">
				<title>
					<p>Transmembrane helices predicted at 95% accuracy</p>
				</title>
				<aug>
					<au>
						<snm>Rost</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Casadio</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Fariselli</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Sander</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>Protein Science</source>
				<pubdate>1995</pubdate>
				<volume>4</volume>
				<issue>3</issue>
				<fpage>521</fpage>
				<lpage>533</lpage>
				<note>[<url>http://www.proteinscience.org/cgi/content/abstract/4/3/521</url>]</note>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">2143072</pubid>
						<pubid idtype="pmpid" link="fulltext">7795533</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B14">
				<title>
					<p>Support Vector Machines for Predicting Membrane Protein Types by Using Functional Domain Composition</p>
				</title>
				<aug>
					<au>
						<snm>Cai</snm>
						<fnm>YD</fnm>
					</au>
					<au>
						<snm>Zhou</snm>
						<fnm>GP</fnm>
					</au>
					<au>
						<snm>Chou</snm>
						<fnm>KC</fnm>
					</au>
				</aug>
				<source>Biophysical Journal</source>
				<pubdate>2003</pubdate>
				<volume>84</volume>
				<issue>5</issue>
				<fpage>3257</fpage>
				<lpage>3263</lpage>
				<note>[<url>http://www.biophysj.org/cgi/content/abstract/84/5/3257</url>]</note>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1302886</pubid>
						<pubid idtype="pmpid" link="fulltext">12719255</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B15">
				<title>
					<p>Prediction of transporter family from protein sequence by support vector machine approach</p>
				</title>
				<aug>
					<au>
						<snm>Lin</snm>
						<fnm>HH</fnm>
					</au>
					<au>
						<snm>Han</snm>
						<fnm>LY</fnm>
					</au>
					<au>
						<snm>Cai</snm>
						<fnm>CZ</fnm>
					</au>
					<au>
						<snm>Ji</snm>
						<fnm>ZL</fnm>
					</au>
					<au>
						<snm>Chen</snm>
						<fnm>YZ</fnm>
					</au>
				</aug>
				<source>Proteins</source>
				<pubdate>2006</pubdate>
				<volume>62</volume>
				<fpage>218</fpage>
				<lpage>231</lpage>
				<note>[<url>http://dx.doi.org/10.1002/prot.20605</url>]</note>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1002/prot.20605</pubid>
						<pubid idtype="pmpid" link="fulltext">16287089</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B16">
				<title>
					<p>Prediction of transmembrane regions of&#946;-barrel proteins using ANN- and SVM-based methods</p>
				</title>
				<aug>
					<au>
						<snm>Natt</snm>
						<fnm>NK</fnm>
					</au>
					<au>
						<snm>Kaur</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Raghava</snm>
						<fnm>GPS</fnm>
					</au>
				</aug>
				<source>Proteins</source>
				<pubdate>2004</pubdate>
				<volume>56</volume>
				<fpage>11</fpage>
				<lpage>18</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1002/prot.20092</pubid>
						<pubid idtype="pmpid" link="fulltext">15162482</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B17">
				<title>
					<p>Discrimination of outer membrane proteins using support vector machines</p>
				</title>
				<aug>
					<au>
						<snm>Park</snm>
						<fnm>KJ</fnm>
					</au>
					<au>
						<snm>Gromiha</snm>
						<fnm>MM</fnm>
					</au>
					<au>
						<snm>Horton</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Suwa</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2005</pubdate>
				<volume>21</volume>
				<issue>23</issue>
				<fpage>4223</fpage>
				<lpage>4229</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/bti697</pubid>
						<pubid idtype="pmpid" link="fulltext">16204348</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B18">
				<title>
					<p>SVMtm: Support vector machines to predict transmembrane segments</p>
				</title>
				<aug>
					<au>
						<snm>Yuan</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Mattick</snm>
						<fnm>JS</fnm>
					</au>
					<au>
						<snm>Teasdale</snm>
						<fnm>RD</fnm>
					</au>
				</aug>
				<source>Journal of Computational Chemistry</source>
				<pubdate>2004</pubdate>
				<volume>25</volume>
				<issue>5</issue>
				<fpage>632</fpage>
				<lpage>636</lpage>
				<note>[<url>http://dx.doi.org/10.1002/jcc.10411</url>]</note>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1002/jcc.10411</pubid>
						<pubid idtype="pmpid" link="fulltext">14978706</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B19">
				<title>
					<p>A hidden Markov model for predicting transmembrane helices in protein sequences</p>
				</title>
				<aug>
					<au>
						<snm>Sonnhammer</snm>
						<fnm>ELL</fnm>
					</au>
					<au>
						<snm>von Heijne</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Krogh</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Proceedings of the 6th International Conference on Intelligent Systems for Molecular Biology (ISMB)</source>
				<publisher>Menlo Park, CA: AAAI Press</publisher>
				<pubdate>1998</pubdate>
				<fpage>175</fpage>
				<lpage>182</lpage>
				<note>[<url>http://citeseer.ist.psu.edu/sonnhammer98hidden.html</url>]</note>
			</bibl>
			<bibl id="B20">
				<title>
					<p>Best&#945;-helical transmembrane protein topology predictions are achieved using hidden Markov models and evolutionary information</p>
				</title>
				<aug>
					<au>
						<snm>Viklund</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Elofsson</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Protein Science</source>
				<pubdate>2004</pubdate>
				<volume>13</volume>
				<issue>7</issue>
				<fpage>1908</fpage>
				<lpage>1917</lpage>
				<note>[<url>http://www.proteinscience.org/cgi/content/abstract/13/7/1908</url>]</note>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1110/ps.04625404</pubid>
						<pubid idtype="pmpid" link="fulltext">15215532</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B21">
				<title>
					<p>Protein flexibility and intrinsic disorder</p>
				</title>
				<aug>
					<au>
						<snm>Radivojac</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Obradovic</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Smith</snm>
						<fnm>DK</fnm>
					</au>
					<au>
						<snm>Zhu</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Vucetic</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Brown</snm>
						<fnm>CJ</fnm>
					</au>
					<au>
						<snm>Lawson</snm>
						<fnm>JD</fnm>
					</au>
					<au>
						<snm>Dunker</snm>
						<fnm>AK</fnm>
					</au>
				</aug>
				<source>Protein Science</source>
				<pubdate>2004</pubdate>
				<volume>13</volume>
				<fpage>71</fpage>
				<lpage>80</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1110/ps.03128904</pubid>
						<pubid idtype="pmpid" link="fulltext">14691223</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B22">
				<title>
					<p>Optimizing Long Intrinsic Disorder Predictors with Protein Evolutionary Information</p>
				</title>
				<aug>
					<au>
						<snm>Peng</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Vucetic</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Radivojac</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Brown</snm>
						<fnm>CJ</fnm>
					</au>
					<au>
						<snm>Dunker</snm>
						<fnm>AK</fnm>
					</au>
					<au>
						<snm>Obradovic</snm>
						<fnm>Z</fnm>
					</au>
				</aug>
				<source>J Bioinform Comput Biol</source>
				<pubdate>2005</pubdate>
				<volume>3</volume>
				<fpage>35</fpage>
				<lpage>60</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1142/S0219720005000886</pubid>
						<pubid idtype="pmpid" link="fulltext">15751111</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B23">
				<title>
					<p>The importance of intrinsic disorder for protein phosphorylation</p>
				</title>
				<aug>
					<au>
						<snm>Iakoucheva</snm>
						<fnm>LM</fnm>
					</au>
					<au>
						<snm>Radivojac</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Brown</snm>
						<fnm>CJ</fnm>
					</au>
					<au>
						<snm>O'Connor</snm>
						<fnm>TR</fnm>
					</au>
					<au>
						<snm>Sikes</snm>
						<fnm>JG</fnm>
					</au>
					<au>
						<snm>Obradovic</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Dunker</snm>
						<fnm>AK</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2004</pubdate>
				<volume>32</volume>
				<issue>3</issue>
				<fpage>1037</fpage>
				<lpage>1049</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">373391</pubid>
						<pubid idtype="pmpid" link="fulltext">14960716</pubid>
						<pubid idtype="doi">10.1093/nar/gkh253</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B24">
				<title>
					<p>Intelligent Data Analysis for Protein Disorder Prediction</p>
				</title>
				<aug>
					<au>
						<snm>Romero</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Dunker</snm>
						<fnm>AK</fnm>
					</au>
				</aug>
				<source>Artificial Intelligence Review</source>
				<pubdate>2000</pubdate>
				<fpage>14</fpage>
			</bibl>
			<bibl id="B25">
				<title>
					<p>The protein trinity&#8211;linking function and disorder</p>
				</title>
				<aug>
					<au>
						<snm>Dunker</snm>
						<fnm>AK</fnm>
					</au>
					<au>
						<snm>Obradovic</snm>
						<fnm>Z</fnm>
					</au>
				</aug>
				<source>Nature Biotechnology</source>
				<pubdate>2001</pubdate>
				<volume>19</volume>
				<issue>9</issue>
				<fpage>805</fpage>
				<lpage>806</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nbt0901-805</pubid>
						<pubid idtype="pmpid" link="fulltext">11533628</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B26">
				<title>
					<p>A simple method for displaying the hydropathic character of a protein</p>
				</title>
				<aug>
					<au>
						<snm>Kyte</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Doolittle</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>J. Mol. Biol</source>
				<pubdate>1982</pubdate>
				<volume>157</volume>
				<fpage>105</fpage>
				<lpage>132</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/0022-2836(82)90515-0</pubid>
						<pubid idtype="pmpid" link="fulltext">7108955</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B27">
				<title>
					<p>Analysis of membrane and surface protein sequences with the hydrophobic moment plot</p>
				</title>
				<aug>
					<au>
						<snm>Eisenberg</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Schwarz</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Komaromy</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Wall</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>Journal of Molecular Biology</source>
				<pubdate>1984</pubdate>
				<volume>179</volume>
				<fpage>125</fpage>
				<lpage>142</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/0022-2836(84)90309-7</pubid>
						<pubid idtype="pmpid" link="fulltext">6502707</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B28">
				<title>
					<p>Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins</p>
				</title>
				<aug>
					<au>
						<snm>Engelman</snm>
						<fnm>DM</fnm>
					</au>
					<au>
						<snm>Steitz</snm>
						<fnm>TA</fnm>
					</au>
					<au>
						<snm>Goldman</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Annu. Rev. Biophys. Biophys. Chem</source>
				<pubdate>1986</pubdate>
				<volume>15</volume>
				<fpage>321</fpage>
				<lpage>353</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1146/annurev.bb.15.060186.001541</pubid>
						<pubid idtype="pmpid">3521657</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B29">
				<title>
					<p>Guidelines for membrane protein engineering derived from de novo designed model peptides</p>
				</title>
				<aug>
					<au>
						<snm>Liu</snm>
						<fnm>LP</fnm>
					</au>
					<au>
						<snm>Deber</snm>
						<fnm>CM</fnm>
					</au>
				</aug>
				<source>Biopolymers (Peptide Science)</source>
				<pubdate>1998</pubdate>
				<volume>5</volume>
				<issue>47</issue>
				<fpage>41</fpage>
				<lpage>62</lpage>
				<xrefbib>
					<pubid idtype="doi">10.1002/(SICI)1097-0282(1998)47:1&lt;41::AID-BIP6&gt;3.0.CO;2-X</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B30">
				<title>
					<p>Amino acid difference formula to help explain protein evolution</p>
				</title>
				<aug>
					<au>
						<snm>Grantham</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>1974</pubdate>
				<volume>185</volume>
				<issue>4154</issue>
				<fpage>862</fpage>
				<lpage>864</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.185.4154.862</pubid>
						<pubid idtype="pmpid" link="fulltext">4843792</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B31">
				<title>
					<p>The characterization of amino acid sequences in proteins by statistical methods</p>
				</title>
				<aug>
					<au>
						<snm>Zimmerman</snm>
						<fnm>JM</fnm>
					</au>
					<au>
						<snm>Eliezer</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Simha</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>J. Theor. Biol</source>
				<pubdate>1968</pubdate>
				<volume>21</volume>
				<issue>2</issue>
				<fpage>170</fpage>
				<lpage>201</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/0022-5193(68)90069-6</pubid>
						<pubid idtype="pmpid" link="fulltext">5700434</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B32">
				<title>
					<p>Correlations between amino acid hydrophobicity scales and stain exclusion capacity of type 1 collagen fibrils</p>
				</title>
				<aug>
					<au>
						<snm>Ortolani</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Raspanti</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Marchini</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>J. Electron Microscopy</source>
				<pubdate>1994</pubdate>
				<volume>43</volume>
				<fpage>32</fpage>
				<lpage>8</lpage>
			</bibl>
			<bibl id="B33">
				<title>
					<p>Electronic properties of amino acid side chains: quantum mechanics calculation of substituent effects</p>
				</title>
				<aug>
					<au>
						<snm>Dwyer</snm>
						<fnm>DS</fnm>
					</au>
				</aug>
				<source>BMC Chemical Biology</source>
				<pubdate>2005</pubdate>
				<volume>5</volume>
				<issue>2</issue>
				<fpage>1</fpage>
				<lpage>11</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1180429</pubid>
						<pubid idtype="pmpid" link="fulltext">15998468</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B34">
				<title>
					<p>Uncoupling Hydrophobicity and Helicity in Transmembrane Segments</p>
				</title>
				<aug>
					<au>
						<snm>Liu</snm>
						<fnm>LP</fnm>
					</au>
					<au>
						<snm>Deber</snm>
						<fnm>CM</fnm>
					</au>
				</aug>
				<source>J. Biol. Chem</source>
				<pubdate>1998</pubdate>
				<volume>273</volume>
				<issue>37</issue>
				<fpage>23645</fpage>
				<lpage>23648</lpage>
				<note>[<url>http://www.jbc.org/cgi/content/abstract/273/37/23645</url>]</note>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1074/jbc.273.37.23645</pubid>
						<pubid idtype="pmpid" link="fulltext">9726967</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B35">
				<title>
					<p>C4.5: Programs for Machine Learning</p>
				</title>
				<aug>
					<au>
						<snm>Quinlan</snm>
						<fnm>JR</fnm>
					</au>
				</aug>
				<publisher>San Francisco: Morgan Kaufmann</publisher>
				<pubdate>1993</pubdate>
			</bibl>
			<bibl id="B36">
				<title>
					<p>Making large-Scale SVM Learning Practical</p>
				</title>
				<aug>
					<au>
						<snm>Joachims</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<source>Advances in Kernel Methods &#8211; Support Vector Learning</source>
				<publisher>MIT Press</publisher>
				<editor>Edited by Sch&#246;lkopf B, Burges C, Smola A</editor>
				<pubdate>1999</pubdate>
			</bibl>
			<bibl id="B37">
				<title>
					<p>Self-Organizing Global Ranking Algorithm and its Applications</p>
				</title>
				<aug>
					<au>
						<snm>Saglam</snm>
						<fnm>MI</fnm>
					</au>
					<au>
						<snm>Ersoy</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Erer</snm>
						<fnm>I</fnm>
					</au>
				</aug>
				<source>In Intelligent Engineering Systems Through Artificial Neural Networks, Volume 14</source>
				<pubdate>2004</pubdate>
				<fpage>893</fpage>
				<lpage>898</lpage>
			</bibl>
			<bibl id="B38">
				<title>
					<p>Identification of Transmembrane Proteins Using Variants of the Self-Organizing Feature Map Algorithm</p>
				</title>
				<aug>
					<au>
						<snm>Yang</snm>
						<fnm>MQX</fnm>
					</au>
					<au>
						<snm>Yang</snm>
						<fnm>JY</fnm>
					</au>
					<au>
						<snm>Codrington</snm>
						<fnm>CW</fnm>
					</au>
				</aug>
				<source>Knowledge Discovery in Bioinformatics: Techniques, Methods and Applications</source>
				<publisher>John Wiley &amp; Sons</publisher>
				<editor>Edited by Pan Y, Hu X</editor>
				<pubdate>2006</pubdate>
			</bibl>
			<bibl id="B39">
				<title>
					<p>Predicting Protein Structure and Function Using Machine Learning Methods</p>
				</title>
				<aug>
					<au>
						<snm>Yang</snm>
						<fnm>MQX</fnm>
					</au>
				</aug>
				<source>PhD thesis</source>
				<publisher>Purdue University, West Lafayette, Indiana</publisher>
				<pubdate>2005</pubdate>
			</bibl>
			<bibl id="B40">
				<title>
					<p>Self-organizing formation of topologically correct feature maps</p>
				</title>
				<aug>
					<au>
						<snm>Kohonen</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<source>Biological Cybernetics</source>
				<pubdate>1982</pubdate>
				<volume>43</volume>
				<fpage>59</fpage>
				<lpage>69</lpage>
				<xrefbib>
					<pubid idtype="doi">10.1007/BF00337288</pubid>
				</xrefbib>
			</bibl>
		</refgrp>
	</bm>
</art>
