<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-7-405</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Predicting the effect of missense mutations on protein function: analysis with Bayesian networks</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Needham</snm>
               <mi>J</mi>
               <fnm>Chris</fnm>
               <insr iid="I1"/>
               <email>chrisn@comp.leeds.ac.uk</email>
            </au>
            <au id="A2">
               <snm>Bradford</snm>
               <mi>R</mi>
               <fnm>James</fnm>
               <insr iid="I2"/>
               <email>J.R.Bradford@leeds.ac.uk</email>
            </au>
            <au id="A3">
               <snm>Bulpitt</snm>
               <mi>J</mi>
               <fnm>Andrew</fnm>
               <insr iid="I1"/>
               <email>andyb@comp.leeds.ac.uk</email>
            </au>
            <au id="A4">
               <snm>Care</snm>
               <mi>A</mi>
               <fnm>Matthew</fnm>
               <insr iid="I2"/>
               <email>M.A.Care98@leeds.ac.uk</email>
            </au>
            <au id="A5">
               <snm>Westhead</snm>
               <mi>R</mi>
               <fnm>David</fnm>
               <insr iid="I2"/>
               <email>D.R.Westhead@leeds.ac.uk</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>School of Computing, University of Leeds, Leeds, LS2 9JT, UK</p>
            </ins>
            <ins id="I2">
               <p>Institute of Molecular and Cellular Biology, University of Leeds, Leeds, LS2 9JT, UK</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2006</pubdate>
         <volume>7</volume>
         <issue>1</issue>
         <fpage>405</fpage>
         <url>http://www.biomedcentral.com/1471-2105/7/405</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">16956412</pubid>
               <pubid idtype="doi">10.1186/1471-2105-7-405</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>24</day>
               <month>5</month>
               <year>2006</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>06</day>
               <month>9</month>
               <year>2006</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>06</day>
               <month>9</month>
               <year>2006</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2006</year>
         <collab>Needham et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>A number of methods that use both protein structural and evolutionary information are available to predict the functional consequences of missense mutations. However, many of these methods break down if either one of the two types of data are missing. Furthermore, there is a lack of rigorous assessment of how important the different factors are to prediction.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Here we use Bayesian networks to predict whether or not a missense mutation will affect the function of the protein. Bayesian networks provide a concise representation for inferring models from data, and are known to generalise well to new data. More importantly, they can handle the noisy, incomplete and uncertain nature of biological data. Our Bayesian network achieved comparable performance with previous machine learning methods. The predictive performance of learned model structures was no better than a na&#239;ve Bayes classifier. However, analysis of the posterior distribution of model structures allows biologically meaningful interpretation of relationships between the input variables.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>The ability of the Bayesian network to make predictions when only structural or evolutionary data was observed allowed us to conclude that structural information is a significantly better predictor of the functional consequences of a missense mutation than evolutionary information, for the dataset used. Analysis of the posterior distribution of model structures revealed that the top three strongest connections with the class node all involved structural nodes. With this in mind, we derived a simplified Bayesian network that used just these three structural descriptors, with comparable performance to that of an all node network.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>An important aspect of the post-genomic era is to understand the biological effects of inherited variations between individuals. For instance, a key problem for the pharmaceutical industry is to understand variations in drug treatment responses among individuals at the molecular level. A single nucleotide polymorphism (SNP) is a mutation, such as an insertion, deletion or substitution, observed in the genomic DNA of individuals of the same species. When the SNP results in an amino acid substitution in the protein product of the gene, it is called a missense mutation. A missense mutation can have various phenotypic effects although we restrict ourselves here to the simplified task of predicting whether a missense mutation has an effect or no effect on protein function.</p>
         <p>The wealth of SNP data now available <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp> has prompted a number of studies on the functional consequences of SNPs. For example, Wang and Moult <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> and Ramensky <it>et al</it>. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> showed that most of the detrimental missense mutations affect protein function indirectly through effects on protein structural stability particularly disruption to the protein hydrophobic core. The evolutionary properties of the mutated residue may also be important determinants of its effect on protein function <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>, since conserved amino acids tend to be functionally important or critical in maintaining structural integrity. A number of groups have developed strategies to predict the effects of missense mutations by using structural or evolutionary information, or a combination of both. Most of these methods claim prediction accuracies of between 70 &#8211; 80% although comparison is extremely difficult due to the use of different data sets and criteria for assigning a mutation as having an effect or not. Chasman and Adams <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> proposed a probabilistic method, and Krishnan and Westhead <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> evaluated decision trees and support vector machines. Herrgard <it>et al</it>. <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> used structural motifs called Fuzzy Functional Forms to predict the effects of amino acid mutations on enzyme catalytic activity. Deleterious human alleles were predicted by Sunyaev <it>et al</it>. <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> using mostly structural information. By contrast, <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> used purely sequence homology data in their SIFT (Sorting Intolerant From Tolerant) algorithm, although adding structural information resulted in significant improvements <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. Subsequent work has compared SIFT to SVMs and random forests <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. Cai <it>et al</it>. <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> used a Bayesian framework to predict pathogenic SNPs. Verzilli <it>et al</it>. <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> applied a hierarchical Bayesian multivariate adaptive regression spline (hierarchical BMARS) model for binary classification of the functional consequences of SNPs. Within this model, samples from the posterior distribution were used to highlight properties of the mutated residue that are most important in predicting its effect on protein function.</p>
         <p>All these methods require either structural or evolutionary data to be available for predictions to be possible. However, there are many proteins that lack any detectable sequence homology to known proteins or a solved 3D structure. In these cases, many prediction methods break down. Therefore a method is needed that can combine both structural and evolutionary information but at the same time tolerate the absence of either without manual intervention. With this in mind we have applied Bayesian networks to the problem of predicting the consequences of a missense mutation on protein function. Bayesian networks are probabilistic graphical models which provide a neat compact representation for expressing joint probability distributions and inference. The representation and use of probability theory makes Bayesian networks suitable for learning from incomplete datasets, expressing causal relationships, combining domain knowledge and data, and avoiding over-fitting a model to training data. As such, a host of applications in computational biology (for example, see <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp>) have used Bayesian networks and Bayesian learning methodologies <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>. Our detailed evaluation of Bayesian network performance in this work is likely to be valuable to many groups working with Bayesian networks and biological data.</p>
         <sec>
            <st>
               <p>Bayesian networks</p>
            </st>
            <p>Our recent primer <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> introduces Bayesian networks to the computational biologist. Briefly, given a set of variables <b>x </b>= {<it>x</it><sub>1</sub>,..., <it>x</it><sub><it>N</it></sub>}, which are represented as nodes in the Bayesian network, a set of directed edges representing relationships between nodes can be defined in a graph structure. To allow efficient inference and learning, a directed acyclic graph (DAG) must be formed, which exploits the conditional independence relations between variables. Using this model structure, model parameters <it>&#952; </it>in the form of conditional probability distributions (CPDs) between the connected variables may be learned. With discrete data, these model parameters take the form of conditional probability tables (CPTs). Throughout this work, we have used the Bayes Net Toolbox for MATLAB (BNT) <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. The code used to produce the results presented in this paper is available on request from the authors.</p>
            <sec>
               <st>
                  <p>Learning from complete data</p>
               </st>
               <p>The Bayesian learning paradigm can be summarised as:</p>
               <p><it>p</it>(<b>x</b>|<it>D</it>) = &#8747;<it>p</it>(<b>x</b>|<it>&#952;</it>)<it>p</it>(<it>&#952;</it>|<it>D</it>)<it>d&#952;</it></p>
               <p>I.e., the predictive distribution for a new example observation, given a set of training examples <it>D </it>can be calculated by averaging over all possible models <it>&#952; </it>the likelihood of the example <b>x </b>given the model, multiplied by the likelihood of the model given the training data. For a given model structure <m:math name="1471-2105-7-405-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math> the model <it>&#952; </it>can be thought of as the model parameters that encode the conditional probability distributions between variables and their parents in <m:math name="1471-2105-7-405-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math>.</p>
            </sec>
            <sec>
               <st>
                  <p>Learning from incomplete data</p>
               </st>
               <p>One advantage of using Bayesian networks is that it is possible to learn model parameters from incomplete training data i.e. in cases where variables are missing. To learn from incomplete data, we used the Expectation-Maximisation (EM) algorithm, which estimates missing values by computing the expected values and updating parameters using these expected values as if they were observed values.</p>
            </sec>
            <sec>
               <st>
                  <p>Structure learning</p>
               </st>
               <p>A fully connected network structure captures relationships (dependencies) between all of the variables. A simpler, more compact model may be produced if conditional independencies between variables are learned. To do this, we used the greedy search algorithm from the Matlab-based structure learning package (SLP) <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> with tabular CPDs and uninformative Dirichlet priors (BDeu). The greedy search algorithm starts with a graph with no edges between the nodes, and aims to maximise a score function: either the full Bayesian posterior or the Bayesian Information Criterion (BIC). At each stage, the neighbourhood of the current graph (the set of graphs that differ by adding, reversing or deleting an edge) are considered, and the one with the highest score is chosen, until convergence. We use the notation of Heckerman, where <m:math name="1471-2105-7-405-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math><sup><it>h </it></sup>is a model structure hypothesis. From Bayes' theorem the posterior distribution for network structures <it>p</it>(<m:math name="1471-2105-7-405-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math><sup><it>h</it></sup>|<it>D</it>) is proportional to the marginal likelihood of the data <it>p</it>(<it>D</it>|<m:math name="1471-2105-7-405-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math><sup><it>h</it></sup>). The full Bayesian posterior can be calculated [<abbrgrp><abbr bid="B27">27</abbr></abbrgrp>, equation 35], or the BIC approximation can be used, which contains a term to describe how well the maximum likelihood model <m:math name="1471-2105-7-405-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mover accent="true"><m:mi>&#952;</m:mi><m:mo>^</m:mo></m:mover><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacuWF4oqCgaqcaaaa@2E79@</m:annotation></m:semantics></m:math><sub><it>s </it></sub>for structure <m:math name="1471-2105-7-405-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math><sup><it>h </it></sup>predicts the data <it>D</it>, and a term that punishes model complexity. For a model with <it>d </it>parameters, built from <it>N </it>samples, the BIC score is:</p>
               <p>
                  <m:math name="1471-2105-7-405-i3" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>ln</m:mi>
                           <m:mo>&#8289;</m:mo>
                           <m:mi>p</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>D</m:mi>
                           <m:mo>|</m:mo>
                           <m:msup>
                              <m:mi mathvariant="script">S</m:mi>
                              <m:mi>h</m:mi>
                           </m:msup>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>&#8776;</m:mo>
                           <m:mi>ln</m:mi>
                           <m:mo>&#8289;</m:mo>
                           <m:mi>p</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>D</m:mi>
                           <m:mo>|</m:mo>
                           <m:msub>
                              <m:mover accent="true">
                                 <m:mi>&#952;</m:mi>
                                 <m:mo>^</m:mo>
                              </m:mover>
                              <m:mi>s</m:mi>
                           </m:msub>
                           <m:mo>,</m:mo>
                           <m:msup>
                              <m:mi mathvariant="script">S</m:mi>
                              <m:mi>h</m:mi>
                           </m:msup>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>&#8722;</m:mo>
                           <m:mfrac>
                              <m:mi>d</m:mi>
                              <m:mn>2</m:mn>
                           </m:mfrac>
                           <m:mi>ln</m:mi>
                           <m:mo>&#8289;</m:mo>
                           <m:mi>N</m:mi>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaacyGGSbaBcqGGUbGBcqWGWbaCcqGGOaakcqWGebarcqGG8baFimaacqWFse=udaahaaWcbeqaaiabdIgaObaakiabcMcaPiabgIKi7kGbcYgaSjabc6gaUjabdchaWjabcIcaOiabdseaejabcYha8HGaciqb+H7aXzaajaWaaSbaaSqaaiabdohaZbqabaGccqGGSaalcqWFse=udaahaaWcbeqaaiabdIgaObaakiabcMcaPiabgkHiTmaalaaabaGaemizaqgabaGaeGOmaidaaiGbcYgaSjabc6gaUjabd6eaobaa@5B51@</m:annotation>
                     </m:semantics>
                  </m:math>
               </p>
            </sec>
            <sec>
               <st>
                  <p>Inference with missing data</p>
               </st>
               <p>Knowledge of the conditional probability distributions between variables allows us to make predictions about the expected states of variables even if some variables are missing from the test data. For example, if structural information about a test missense mutation is not available, we can still infer whether the mutation has a functional effect on the protein or not by marginalising over the unknown variables. This is illustrated in a very simple Bayesian network with three nodes, A, B, C, which can take the values {<it>a</it><sub>1</sub>,..., <m:math name="1471-2105-7-405-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>a</m:mi><m:mrow><m:msub><m:mi>N</m:mi><m:mi>A</m:mi></m:msub></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGHbqydaWgaaWcbaGaemOta40aaSbaaWqaaiabdgeabbqabaaaleqaaaaa@308B@</m:annotation></m:semantics></m:math>}, {<it>b</it><sub>1</sub>,..., <m:math name="1471-2105-7-405-i5" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>b</m:mi><m:mrow><m:msub><m:mi>N</m:mi><m:mi>B</m:mi></m:msub></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGIbGydaWgaaWcbaGaemOta40aaSbaaWqaaiabdkeacbqabaaaleqaaaaa@308F@</m:annotation></m:semantics></m:math>}, and {<it>c</it><sub>1</sub>,..., <m:math name="1471-2105-7-405-i6" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>c</m:mi><m:mrow><m:msub><m:mi>N</m:mi><m:mi>C</m:mi></m:msub></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGJbWydaWgaaWcbaGaemOta40aaSbaaWqaaiabdoeadbqabaaaleqaaaaa@3093@</m:annotation></m:semantics></m:math>} respectively and a structure given by Figure <figr fid="F1">1</figr>. The joint probability over all the variables is:</p>
               <p><it>p</it>(<it>A</it>, <it>B</it>, <it>C</it>) = <it>p</it>(<it>A</it>)<it>p</it>(<it>B</it>|<it>A</it>)<it>p</it>(<it>C</it>|<it>A</it>, <it>B</it>)</p>
               <p>Each of the probabilities can be expressed as a conditional probability table in this discrete case. If we wish to infer the value of <it>C </it>given <it>A </it>= <it>a</it><sub><it>i </it></sub>and <it>B </it>= <it>b</it><sub><it>j </it></sub>then we can calculate the probability of <it>C </it>taking each of the possible values, <it>C </it>= <it>c</it><sub><it>k </it></sub>for <it>k </it>= 1,..., <it>N</it><sub><it>C </it></sub>by <it>p</it>(<it>c</it><sub><it>k</it></sub>|<it>a</it><sub><it>i</it></sub>, <it>b</it><sub><it>j</it></sub>) read from CPTs. If we wish to infer the value of C given only the value of A, we can marginalise over the unknown variables (in this case, B). Thus:</p>
               <p>
                  <m:math name="1471-2105-7-405-i7" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>p</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msub>
                              <m:mi>c</m:mi>
                              <m:mi>k</m:mi>
                           </m:msub>
                           <m:mo>|</m:mo>
                           <m:msub>
                              <m:mi>a</m:mi>
                              <m:mi>i</m:mi>
                           </m:msub>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munder>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:msub>
                                       <m:mi>b</m:mi>
                                       <m:mi>j</m:mi>
                                    </m:msub>
                                    <m:mo>&#8712;</m:mo>
                                    <m:mi>B</m:mi>
                                 </m:mrow>
                              </m:munder>
                              <m:mrow>
                                 <m:mi>p</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:msub>
                                    <m:mi>b</m:mi>
                                    <m:mi>j</m:mi>
                                 </m:msub>
                                 <m:mo>|</m:mo>
                                 <m:msub>
                                    <m:mi>a</m:mi>
                                    <m:mi>i</m:mi>
                                 </m:msub>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mi>p</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:msub>
                                    <m:mi>c</m:mi>
                                    <m:mi>k</m:mi>
                                 </m:msub>
                                 <m:mo>|</m:mo>
                                 <m:msub>
                                    <m:mi>a</m:mi>
                                    <m:mi>i</m:mi>
                                 </m:msub>
                                 <m:mo>,</m:mo>
                                 <m:msub>
                                    <m:mi>b</m:mi>
                                    <m:mi>j</m:mi>
                                 </m:msub>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:mstyle>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGWbaCcqGGOaakcqWGJbWydaWgaaWcbaGaem4AaSgabeaakiabcYha8jabdggaHnaaBaaaleaacqWGPbqAaeqaaOGaeiykaKIaeyypa0ZaaabuaeaacqWGWbaCcqGGOaakcqWGIbGydaWgaaWcbaGaemOAaOgabeaakiabcYha8jabdggaHnaaBaaaleaacqWGPbqAaeqaaOGaeiykaKIaemiCaaNaeiikaGIaem4yam2aaSbaaSqaaiabdUgaRbqabaGccqGG8baFcqWGHbqydaWgaaWcbaGaemyAaKgabeaakiabcYcaSiabdkgaInaaBaaaleaacqWGQbGAaeqaaOGaeiykaKcaleaacqWGIbGydaWgaaadbaGaemOAaOgabeaaliabgIGiolabdkeacbqab0GaeyyeIuoaaaa@5815@</m:annotation>
                     </m:semantics>
                  </m:math>
               </p>
               <fig id="F1">
                  <title>
                     <p>Figure 1</p>
                  </title>
                  <caption>
                     <p>Example 3 node Bayesian network</p>
                  </caption>
                  <text>
                     <p><b>Example 3 node Bayesian network</b>. Example 3 node Bayesian network.</p>
                  </text>
                  <graphic file="1471-2105-7-405-1"/>
               </fig>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <p>The systematic unbiased mutagenesis dataset of lac repressor <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp> and T4 lysozyme <abbrgrp><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr></abbrgrp> were used to train and validate the Bayesian networks. Classification of 'effect' and 'no effect' mutations was based on that of <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> in which only those mutations resulting in a significant loss of function were considered 'effect' mutations. As a result, our lac repressor dataset consisted of 823 effect and 2422 no effect mutations, and our T4 lysozyme dataset contained 312 effect and 1320 no effect mutations.</p>
         <p>A total of fourteen variables were used to predict whether or not a missense mutation affects protein function (Table <tblr tid="T1">1</tblr>; Note also the abbreviations introduced &#8211; taken from the dataset of <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>). All these variables have been implicated in previous studies as useful in discriminating 'effect' from 'no effect' mutations. Six of the variables are continuous (<it>ac</it>, <it>rac</it>, <it>rent</it>, <it>nrent</it>, <it>bf</it>, and <it>nbf</it>), the rest are discrete binary. The variables (excluding the class node) can also be sorted into three groups based on the type of biological information they give: structural, evolutionary, or in the case of <it>nrent</it> structural and evolutionary information.</p>
         <tbl id="T1">
            <title>
               <p>Table 1</p>
            </title>
            <caption>
               <p>Attributes used for predicting functional effects of missense mutations</p>
            </caption>
            <tblbdy cols="4">
               <r>
                  <c ca="left">
                     <p>
                        <b>Abbreviation</b>
                     </p>
                  </c>
                  <c ca="left">
                     <p>
                        <b>Type</b>
                     </p>
                  </c>
                  <c ca="left">
                     <p>
                        <b>Description</b>
                     </p>
                  </c>
                  <c ca="left">
                     <p>
                        <b>Information</b>
                     </p>
                  </c>
               </r>
               <r>
                  <c cspan="4">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>effect</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Discrete</p>
                  </c>
                  <c ca="left">
                     <p>Effect of mutation on functionality</p>
                  </c>
                  <c ca="left">
                     <p>Class</p>
                  </c>
               </r>
               <r>
                  <c cspan="4">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>ac</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Continuous</p>
                  </c>
                  <c ca="left">
                     <p>Solvent accessible area of native AA</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>rac</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Continuous</p>
                  </c>
                  <c ca="left">
                     <p>Accessibility relative to maximum accessibility in training set</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>bf</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Continuous</p>
                  </c>
                  <c ca="left">
                     <p>Normalised B-factor of native AA</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>nbf</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Continuous</p>
                  </c>
                  <c ca="left">
                     <p>Normalised B-factor of structural neighbourhood of native AA</p>
                  </c>
                  <c ca="left">
                     <p>Structural</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>bur</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Discrete</p>
                  </c>
                  <c ca="left">
                     <p>Mutant AA is charged AA at buried site</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>trn</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Discrete</p>
                  </c>
                  <c ca="left">
                     <p>Mutant AA occurs at glycine or proline in a turn</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>hlx</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Discrete</p>
                  </c>
                  <c ca="left">
                     <p>Mutant AA occurs in helical region and involves glycine or proline</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>ifc</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Discrete</p>
                  </c>
                  <c ca="left">
                     <p>Native AA is near subunit interface</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c cspan="4">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>nrent</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Continuous</p>
                  </c>
                  <c ca="left">
                     <p>Phylogenetic entropy of structural neighbourhood of native AA</p>
                  </c>
                  <c ca="left">
                     <p>Structural + Evolutionary</p>
                  </c>
               </r>
               <r>
                  <c cspan="4">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>rent</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Continuous</p>
                  </c>
                  <c ca="left">
                     <p>Normalised phylogenetic entropy of native AA</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>cnsd</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Discrete</p>
                  </c>
                  <c ca="left">
                     <p>Native AA is at conserved position in phylogenetic profile</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>ncnsd</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Discrete</p>
                  </c>
                  <c ca="left">
                     <p>Native AA is near conserved position in phylogenetic profile</p>
                  </c>
                  <c ca="left">
                     <p>Evolutionary</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>uslaa</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Discrete</p>
                  </c>
                  <c ca="left">
                     <p>Mutant AA is not in phylogenetic profile</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>uslby</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Discrete</p>
                  </c>
                  <c ca="left">
                     <p>Mutant AA is not in the smallest AA class that includes the phylogenetic profile</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
            </tblbdy>
         </tbl>
         <p>We used two basic types of Bayesian network structure in this study: na&#239;ve and learned. In the na&#239;ve structure, the <it>effec</it>t node is a parent to all the other nodes in the network structure. Details of the learned structure are provided later. On each of these structures we performed seven experiments:</p>
         <p>&#8226; <it>all</it>:<it>all</it>: 15 node network trained and tested using all 14 variables listed in Table <tblr tid="T1">1</tblr>.</p>
         <p>&#8226; <it>all</it>: <it>noS</it>: 15 node network trained on all variables, tested with evolutionary information only (<it>ac</it>, <it>rac</it>, <it>bf</it>, <it>nbf</it>, <it>bur</it>, <it>trn</it>, <it>hlx</it>, <it>ifc</it>, <it>nrent</it> nodes hidden).</p>
         <p>&#8226; <it>noS</it>:<it>noS</it>: 6 node network (structural nodes missing) trained and tested with evolutionary information only.</p>
         <p>&#8226; <it>all</it>:<it>noE</it>: 15 node network trained on all variables, tested with structural information only (<it>nrent</it>, <it>rent</it>, <it>cnsd</it>, <it>ncnsd</it>, <it>uslaa</it>, <it>uslby</it> nodes hidden).</p>
         <p>&#8226; <it>noE</it>:<it>noE</it>: 9 node network (evolutionary nodes missing) trained and tested with structural information only.</p>
         <p>&#8226; <it>all</it>:<it>key</it>: 15 node network trained on all variables, tested using three key variables (<it>ac</it>, <it>bur</it>, <it>bf</it>). These key variables were identified by analysing a number of learned structures.</p>
         <p>&#8226; <it>key</it>:<it>key</it>: 4 node network trained and tested using key variables only.</p>
         <p>Results of these experiments are presented in Tables <tblr tid="T2">2</tblr> and <tblr tid="T3">3</tblr>. We carried out both homogeneous and heterogeneous cross-validation tests. Homogeneous cross-validation was performed on both lysozyme and lac repressor datasets separately, and a mixed set in which the two datasets were pooled. In each case, data were randomised and divided into 10 equal parts. One part was used as the test set and the remainder as the training set. This procedure was repeated 10 times so that each example (here it is each mutation) was used exactly once for testing. The mean and standard deviation of the ten results were then calculated. In heterogeneous cross-validation, the data set of one protein (e.g. lac repressor) was used as the training set and that of the other protein (e.g. lysozyme) was used as the test set.</p>
         <tbl id="T2">
            <title>
               <p>Table 2</p>
            </title>
            <caption>
               <p>Results with a na&#239;ve Bayes classifier.</p>
            </caption>
            <tblbdy cols="9">
               <r>
                  <c ca="left">
                     <p>Cross-validation</p>
                  </c>
                  <c ca="left">
                     <p>Trained on:</p>
                  </c>
                  <c ca="center">
                     <p>All</p>
                  </c>
                  <c ca="center">
                     <p>All</p>
                  </c>
                  <c ca="center">
                     <p>NoS</p>
                  </c>
                  <c ca="center">
                     <p>All</p>
                  </c>
                  <c ca="center">
                     <p>NoE</p>
                  </c>
                  <c ca="center">
                     <p>All</p>
                  </c>
                  <c ca="center">
                     <p>key</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Tested on:</p>
                  </c>
                  <c ca="center">
                     <p>All</p>
                  </c>
                  <c ca="center">
                     <p>NoS</p>
                  </c>
                  <c ca="center">
                     <p>NoS</p>
                  </c>
                  <c ca="center">
                     <p>NoE</p>
                  </c>
                  <c ca="center">
                     <p>NoE</p>
                  </c>
                  <c ca="center">
                     <p>key</p>
                  </c>
                  <c ca="center">
                     <p>key</p>
                  </c>
               </r>
               <r>
                  <c cspan="9">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>mixed</p>
                  </c>
                  <c ca="left">
                     <p>AUC</p>
                  </c>
                  <c ca="center">
                     <p>0.83 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.70 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.70 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.81 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.81 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.80 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.79 &#177; 0.01</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>MCC</p>
                  </c>
                  <c ca="center">
                     <p>0.44 &#177; 0.04</p>
                  </c>
                  <c ca="center">
                     <p>0.27 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.27 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.43 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.43 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.41 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.35 &#177; 0.06</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Overall error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.19 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.24 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.24 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.18 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.18 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.18 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.21 &#177; 0.00</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Effect error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.35 &#177; 0.05</p>
                  </c>
                  <c ca="center">
                     <p>0.52 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.52 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.26 &#177; 0.07</p>
                  </c>
                  <c ca="center">
                     <p>0.26 &#177; 0.07</p>
                  </c>
                  <c ca="center">
                     <p>0.24 &#177; 0.07</p>
                  </c>
                  <c ca="center">
                     <p>0.41 &#177; 0.04</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>No effect error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.15 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.18 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.18 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.17 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.17 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.17 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.17 &#177; 0.03</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>sensitivity</p>
                  </c>
                  <c ca="center">
                     <p>0.47 &#177; 0.12</p>
                  </c>
                  <c ca="center">
                     <p>0.37 &#177; 0.06</p>
                  </c>
                  <c ca="center">
                     <p>0.37 &#177; 0.06</p>
                  </c>
                  <c ca="center">
                     <p>0.37 &#177; 0.06</p>
                  </c>
                  <c ca="center">
                     <p>0.37 &#177; 0.06</p>
                  </c>
                  <c ca="center">
                     <p>0.36 &#177; 0.09</p>
                  </c>
                  <c ca="center">
                     <p>0.38 &#177; 0.16</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>specificity</p>
                  </c>
                  <c ca="center">
                     <p>0.92 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.88 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.88 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.96 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.96 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.96 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.92 &#177; 0.05</p>
                  </c>
               </r>
               <r>
                  <c cspan="9">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>lac rep</p>
                  </c>
                  <c ca="left">
                     <p>AUC</p>
                  </c>
                  <c ca="center">
                     <p>0.84 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.74 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.74 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.82 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.82 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.80 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.80 &#177; 0.02</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>MCC</p>
                  </c>
                  <c ca="center">
                     <p>0.47 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.33 &#177; 0.06</p>
                  </c>
                  <c ca="center">
                     <p>0.33 &#177; 0.06</p>
                  </c>
                  <c ca="center">
                     <p>0.46 &#177; 0.04</p>
                  </c>
                  <c ca="center">
                     <p>0.46 &#177; 0.04</p>
                  </c>
                  <c ca="center">
                     <p>0.44 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.39 &#177; 0.05</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Overall error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.18 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.23 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.23 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.18 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.18 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.19 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.21 &#177; 0.00</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Effect error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.27 &#177; 0.05</p>
                  </c>
                  <c ca="center">
                     <p>0.40 &#177; 0.04</p>
                  </c>
                  <c ca="center">
                     <p>0.40 &#177; 0.04</p>
                  </c>
                  <c ca="center">
                     <p>0.20 &#177; 0.06</p>
                  </c>
                  <c ca="center">
                     <p>0.20 &#177; 0.06</p>
                  </c>
                  <c ca="center">
                     <p>0.18 &#177; 0.09</p>
                  </c>
                  <c ca="center">
                     <p>0.36 &#177; 0.05</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>No effect error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.16 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.19 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.19 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.18 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.18 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.19 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.18 &#177; 0.03</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>sensitivity</p>
                  </c>
                  <c ca="center">
                     <p>0.47 &#177; 0.10</p>
                  </c>
                  <c ca="center">
                     <p>0.36 &#177; 0.12</p>
                  </c>
                  <c ca="center">
                     <p>0.36 &#177; 0.12</p>
                  </c>
                  <c ca="center">
                     <p>0.37 &#177; 0.08</p>
                  </c>
                  <c ca="center">
                     <p>0.38 &#177; 0.08</p>
                  </c>
                  <c ca="center">
                     <p>0.34 &#177; 0.12</p>
                  </c>
                  <c ca="center">
                     <p>0.41 &#177; 0.13</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>specificity</p>
                  </c>
                  <c ca="center">
                     <p>0.93 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.92 &#177; 0.04</p>
                  </c>
                  <c ca="center">
                     <p>0.92 &#177; 0.04</p>
                  </c>
                  <c ca="center">
                     <p>0.96 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.96 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.97 &#177; 0.04</p>
                  </c>
                  <c ca="center">
                     <p>0.92 &#177; 0.04</p>
                  </c>
               </r>
               <r>
                  <c cspan="9">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>lysozyme</p>
                  </c>
                  <c ca="left">
                     <p>AUC</p>
                  </c>
                  <c ca="center">
                     <p>0.83 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.68 &#177; 0.04</p>
                  </c>
                  <c ca="center">
                     <p>0.68 &#177; 0.05</p>
                  </c>
                  <c ca="center">
                     <p>0.81 &#177; 0.04</p>
                  </c>
                  <c ca="center">
                     <p>0.81 &#177; 0.04</p>
                  </c>
                  <c ca="center">
                     <p>0.78 &#177; 0.04</p>
                  </c>
                  <c ca="center">
                     <p>0.77 &#177; 0.04</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>MCC</p>
                  </c>
                  <c ca="center">
                     <p>0.40 &#177; 0.05</p>
                  </c>
                  <c ca="center">
                     <p>0.23 &#177; 0.06</p>
                  </c>
                  <c ca="center">
                     <p>0.23 &#177; 0.06</p>
                  </c>
                  <c ca="center">
                     <p>0.38 &#177; 0.08</p>
                  </c>
                  <c ca="center">
                     <p>0.38 &#177; 0.08</p>
                  </c>
                  <c ca="center">
                     <p>0.36 &#177; 0.11</p>
                  </c>
                  <c ca="center">
                     <p>0.28 &#177; 0.09</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Overall error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.17 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.24 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.24 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.17 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.17 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.16 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.21 &#177; 0.03</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Effect error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.40 &#177; 0.05</p>
                  </c>
                  <c ca="center">
                     <p>0.63 &#177; 0.05</p>
                  </c>
                  <c ca="center">
                     <p>0.63 &#177; 0.05</p>
                  </c>
                  <c ca="center">
                     <p>0.39 &#177; 0.12</p>
                  </c>
                  <c ca="center">
                     <p>0.39 &#177; 0.12</p>
                  </c>
                  <c ca="center">
                     <p>0.33 &#177; 0.13</p>
                  </c>
                  <c ca="center">
                     <p>0.54 &#177; 0.09</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>No effect error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.13 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.15 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.15 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.13 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.13 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.15 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.14 &#177; 0.02</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Sensitivity</p>
                  </c>
                  <c ca="center">
                     <p>0.43 &#177; 0.11</p>
                  </c>
                  <c ca="center">
                     <p>0.39 &#177; 0.07</p>
                  </c>
                  <c ca="center">
                     <p>0.39 &#177; 0.07</p>
                  </c>
                  <c ca="center">
                     <p>0.38 &#177; 0.17</p>
                  </c>
                  <c ca="center">
                     <p>0.38 &#177; 0.17</p>
                  </c>
                  <c ca="center">
                     <p>0.28 &#177; 0.09</p>
                  </c>
                  <c ca="center">
                     <p>0.36 &#177; 0.11</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Specificity</p>
                  </c>
                  <c ca="center">
                     <p>0.93 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.84 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.84 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.93 &#177; 0.07</p>
                  </c>
                  <c ca="center">
                     <p>0.93 &#177; 0.07</p>
                  </c>
                  <c ca="center">
                     <p>0.97 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.89 &#177; 0.04</p>
                  </c>
               </r>
               <r>
                  <c cspan="9">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Train: lac rep</p>
                  </c>
                  <c ca="left">
                     <p>AUC</p>
                  </c>
                  <c ca="center">
                     <p>0.80</p>
                  </c>
                  <c ca="center">
                     <p>0.66</p>
                  </c>
                  <c ca="center">
                     <p>0.67</p>
                  </c>
                  <c ca="center">
                     <p>0.78</p>
                  </c>
                  <c ca="center">
                     <p>0.78</p>
                  </c>
                  <c ca="center">
                     <p>0.77</p>
                  </c>
                  <c ca="center">
                     <p>0.77</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>MCC</p>
                  </c>
                  <c ca="center">
                     <p>0.40</p>
                  </c>
                  <c ca="center">
                     <p>0.23</p>
                  </c>
                  <c ca="center">
                     <p>0.23</p>
                  </c>
                  <c ca="center">
                     <p>0.35</p>
                  </c>
                  <c ca="center">
                     <p>0.35</p>
                  </c>
                  <c ca="center">
                     <p>0.35</p>
                  </c>
                  <c ca="center">
                     <p>0.35</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Overall error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.20</p>
                  </c>
                  <c ca="center">
                     <p>0.27</p>
                  </c>
                  <c ca="center">
                     <p>0.24</p>
                  </c>
                  <c ca="center">
                     <p>0.17</p>
                  </c>
                  <c ca="center">
                     <p>0.17</p>
                  </c>
                  <c ca="center">
                     <p>0.16</p>
                  </c>
                  <c ca="center">
                     <p>0.16</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Test: lysozyme</p>
                  </c>
                  <c ca="left">
                     <p>Effect error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.52</p>
                  </c>
                  <c ca="center">
                     <p>0.65</p>
                  </c>
                  <c ca="center">
                     <p>0.63</p>
                  </c>
                  <c ca="center">
                     <p>0.41</p>
                  </c>
                  <c ca="center">
                     <p>0.41</p>
                  </c>
                  <c ca="center">
                     <p>0.32</p>
                  </c>
                  <c ca="center">
                     <p>0.32</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>No effect error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.10</p>
                  </c>
                  <c ca="center">
                     <p>0.14</p>
                  </c>
                  <c ca="center">
                     <p>0.15</p>
                  </c>
                  <c ca="center">
                     <p>0.14</p>
                  </c>
                  <c ca="center">
                     <p>0.14</p>
                  </c>
                  <c ca="center">
                     <p>0.15</p>
                  </c>
                  <c ca="center">
                     <p>0.16</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Sensitivity</p>
                  </c>
                  <c ca="center">
                     <p>0.58</p>
                  </c>
                  <c ca="center">
                     <p>0.46</p>
                  </c>
                  <c ca="center">
                     <p>0.39</p>
                  </c>
                  <c ca="center">
                     <p>0.33</p>
                  </c>
                  <c ca="center">
                     <p>0.33</p>
                  </c>
                  <c ca="center">
                     <p>0.26</p>
                  </c>
                  <c ca="center">
                     <p>0.26</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Specificity</p>
                  </c>
                  <c ca="center">
                     <p>0.85</p>
                  </c>
                  <c ca="center">
                     <p>0.80</p>
                  </c>
                  <c ca="center">
                     <p>0.84</p>
                  </c>
                  <c ca="center">
                     <p>0.95</p>
                  </c>
                  <c ca="center">
                     <p>0.95</p>
                  </c>
                  <c ca="center">
                     <p>0.97</p>
                  </c>
                  <c ca="center">
                     <p>0.97</p>
                  </c>
               </r>
               <r>
                  <c cspan="9">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Train: lysozyme</p>
                  </c>
                  <c ca="left">
                     <p>AUC</p>
                  </c>
                  <c ca="center">
                     <p>0.81</p>
                  </c>
                  <c ca="center">
                     <p>0.71</p>
                  </c>
                  <c ca="center">
                     <p>0.71</p>
                  </c>
                  <c ca="center">
                     <p>0.80</p>
                  </c>
                  <c ca="center">
                     <p>0.80</p>
                  </c>
                  <c ca="center">
                     <p>0.79</p>
                  </c>
                  <c ca="center">
                     <p>0.79</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>MCC</p>
                  </c>
                  <c ca="center">
                     <p>0.43</p>
                  </c>
                  <c ca="center">
                     <p>0.37</p>
                  </c>
                  <c ca="center">
                     <p>0.37</p>
                  </c>
                  <c ca="center">
                     <p>0.41</p>
                  </c>
                  <c ca="center">
                     <p>0.41</p>
                  </c>
                  <c ca="center">
                     <p>0.42</p>
                  </c>
                  <c ca="center">
                     <p>0.42</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Overall error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.20</p>
                  </c>
                  <c ca="center">
                     <p>0.22</p>
                  </c>
                  <c ca="center">
                     <p>0.22</p>
                  </c>
                  <c ca="center">
                     <p>0.20</p>
                  </c>
                  <c ca="center">
                     <p>0.20</p>
                  </c>
                  <c ca="center">
                     <p>0.19</p>
                  </c>
                  <c ca="center">
                     <p>0.19</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Test: lac rep</p>
                  </c>
                  <c ca="left">
                     <p>Effect error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.34</p>
                  </c>
                  <c ca="center">
                     <p>0.43</p>
                  </c>
                  <c ca="center">
                     <p>0.43</p>
                  </c>
                  <c ca="center">
                     <p>0.25</p>
                  </c>
                  <c ca="center">
                     <p>0.25</p>
                  </c>
                  <c ca="center">
                     <p>0.18</p>
                  </c>
                  <c ca="center">
                     <p>0.18</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>No effect error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.17</p>
                  </c>
                  <c ca="center">
                     <p>0.17</p>
                  </c>
                  <c ca="center">
                     <p>0.17</p>
                  </c>
                  <c ca="center">
                     <p>0.19</p>
                  </c>
                  <c ca="center">
                     <p>0.19</p>
                  </c>
                  <c ca="center">
                     <p>0.20</p>
                  </c>
                  <c ca="center">
                     <p>0.20</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Sensitivity</p>
                  </c>
                  <c ca="center">
                     <p>0.45</p>
                  </c>
                  <c ca="center">
                     <p>0.46</p>
                  </c>
                  <c ca="center">
                     <p>0.46</p>
                  </c>
                  <c ca="center">
                     <p>0.33</p>
                  </c>
                  <c ca="center">
                     <p>0.33</p>
                  </c>
                  <c ca="center">
                     <p>0.30</p>
                  </c>
                  <c ca="center">
                     <p>0.30</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Specificity</p>
                  </c>
                  <c ca="center">
                     <p>0.92</p>
                  </c>
                  <c ca="center">
                     <p>0.88</p>
                  </c>
                  <c ca="center">
                     <p>0.88</p>
                  </c>
                  <c ca="center">
                     <p>0.96</p>
                  </c>
                  <c ca="center">
                     <p>0.96</p>
                  </c>
                  <c ca="center">
                     <p>0.98</p>
                  </c>
                  <c ca="center">
                     <p>0.98</p>
                  </c>
               </r>
            </tblbdy>
            <tblfn>
               <p>Column: (1) trained on all variables, tested with all variables observed; (2) trained on all variables, tested without any structural information (NoS) &#8211; only evolutionary variables observed; (3) trained and tested using only five evolutionary nodes; (4) trained on all variables, tested without any evolutionary information (NoE) &#8211; only structural variables observed; (5) trained and tested using only eight structural nodes; (6) trained on all variables, tested with only key variables observed (see later section); (7) trained and tested using only the three key variables.</p>
            </tblfn>
         </tbl>
         <tbl id="T3">
            <title>
               <p>Table 3</p>
            </title>
            <caption>
               <p>Results with a learned Bayesian network.</p>
            </caption>
            <tblbdy cols="9">
               <r>
                  <c ca="left">
                     <p>Cross-validation</p>
                  </c>
                  <c ca="left">
                     <p>Trained on:</p>
                  </c>
                  <c ca="center">
                     <p>All</p>
                  </c>
                  <c ca="center">
                     <p>All</p>
                  </c>
                  <c ca="center">
                     <p>NoS</p>
                  </c>
                  <c ca="center">
                     <p>All</p>
                  </c>
                  <c ca="center">
                     <p>NoE</p>
                  </c>
                  <c ca="center">
                     <p>All</p>
                  </c>
                  <c ca="center">
                     <p>key</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Tested on:</p>
                  </c>
                  <c ca="center">
                     <p>All</p>
                  </c>
                  <c ca="center">
                     <p>NoS</p>
                  </c>
                  <c ca="center">
                     <p>NoS</p>
                  </c>
                  <c ca="center">
                     <p>NoE</p>
                  </c>
                  <c ca="center">
                     <p>NoE</p>
                  </c>
                  <c ca="center">
                     <p>key</p>
                  </c>
                  <c ca="center">
                     <p>key</p>
                  </c>
               </r>
               <r>
                  <c cspan="9">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>mixed</p>
                  </c>
                  <c ca="left">
                     <p>AUC</p>
                  </c>
                  <c ca="center">
                     <p>0.84 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.64 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.70 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.72 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.82 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.63 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.80 &#177; 0.02</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>MCC</p>
                  </c>
                  <c ca="center">
                     <p>0.46 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.11 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.10 &#177; 0.16</p>
                  </c>
                  <c ca="center">
                     <p>0.26 &#177; 0.22</p>
                  </c>
                  <c ca="center">
                     <p>0.44 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.40 &#177; 0.04</p>
                  </c>
                  <c ca="center">
                     <p>0.40 &#177; 0.04</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Overall error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.17 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.67 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.23 &#177; 0.00</p>
                  </c>
                  <c ca="center">
                     <p>0.36 &#177; 0.28</p>
                  </c>
                  <c ca="center">
                     <p>0.18 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.18 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.18 &#177; 0.01</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Effect error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.27 &#177; 0.05</p>
                  </c>
                  <c ca="center">
                     <p>0.75 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.15 &#177; 0.24</p>
                  </c>
                  <c ca="center">
                     <p>0.40 &#177; 0.25</p>
                  </c>
                  <c ca="center">
                     <p>0.29 &#177; 0.07</p>
                  </c>
                  <c ca="center">
                     <p>0.24 &#177; 0.06</p>
                  </c>
                  <c ca="center">
                     <p>0.25 &#177; 0.05</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>No effect error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.16 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.11 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.21 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.29 &#177; 0.18</p>
                  </c>
                  <c ca="center">
                     <p>0.16 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.18 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.18 &#177; 0.01</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>sensitivity</p>
                  </c>
                  <c ca="center">
                     <p>0.41 &#177; 0.07</p>
                  </c>
                  <c ca="center">
                     <p>0.93 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.13 &#177; 0.21</p>
                  </c>
                  <c ca="center">
                     <p>0.51 &#177; 0.33</p>
                  </c>
                  <c ca="center">
                     <p>0.41 &#177; 0.08</p>
                  </c>
                  <c ca="center">
                     <p>0.31 &#177; 0.04</p>
                  </c>
                  <c ca="center">
                     <p>0.31 &#177; 0.09</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>specificity</p>
                  </c>
                  <c ca="center">
                     <p>0.95 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.15 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.96 &#177; 0.07</p>
                  </c>
                  <c ca="center">
                     <p>0.68 &#177; 0.47</p>
                  </c>
                  <c ca="center">
                     <p>0.95 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.97 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.97 &#177; 0.01</p>
                  </c>
               </r>
               <r>
                  <c cspan="9">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>lac rep</p>
                  </c>
                  <c ca="left">
                     <p>AUC</p>
                  </c>
                  <c ca="center">
                     <p>0.85 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.47 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.73 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.70 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.82 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.61 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.81 &#177; 0.02</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>MCC</p>
                  </c>
                  <c ca="center">
                     <p>0.52 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.11 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.32 &#177; 0.04</p>
                  </c>
                  <c ca="center">
                     <p>0.43 &#177; 0.04</p>
                  </c>
                  <c ca="center">
                     <p>0.46 &#177; 0.05</p>
                  </c>
                  <c ca="center">
                     <p>0.42 &#177; 0.04</p>
                  </c>
                  <c ca="center">
                     <p>0.42 &#177; 0.03</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Overall error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.17 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.60 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.24 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.19 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.18 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.19 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.19 &#177; 0.01</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Effect error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.25 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.72 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.46 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.20 &#177; 0.06</p>
                  </c>
                  <c ca="center">
                     <p>0.21 &#177; 0.05</p>
                  </c>
                  <c ca="center">
                     <p>0.17 &#177; 0.07</p>
                  </c>
                  <c ca="center">
                     <p>0.22 &#177; 0.06</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>No effect error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.15 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.16 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.19 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.19 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.18 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.20 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.19 &#177; 0.01</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>sensitivity</p>
                  </c>
                  <c ca="center">
                     <p>0.51 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.86 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.40 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.33 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.38 &#177; 0.06</p>
                  </c>
                  <c ca="center">
                     <p>0.30 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.33 &#177; 0.02</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>specificity</p>
                  </c>
                  <c ca="center">
                     <p>0.94 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.24 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.88 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.97 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.96 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.98 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.97 &#177; 0.01</p>
                  </c>
               </r>
               <r>
                  <c cspan="9">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>lysozyme</p>
                  </c>
                  <c ca="left">
                     <p>AUC</p>
                  </c>
                  <c ca="center">
                     <p>0.86 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.51 &#177; 0.06</p>
                  </c>
                  <c ca="center">
                     <p>0.67 &#177; 0.05</p>
                  </c>
                  <c ca="center">
                     <p>0.78 &#177; 0.04</p>
                  </c>
                  <c ca="center">
                     <p>0.83 &#177; 0.05</p>
                  </c>
                  <c ca="center">
                     <p>0.70 &#177; 0.04</p>
                  </c>
                  <c ca="center">
                     <p>0.78 &#177; 0.05</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>MCC</p>
                  </c>
                  <c ca="center">
                     <p>0.47 &#177; 0.06</p>
                  </c>
                  <c ca="center">
                     <p>0.09 &#177; 0.05</p>
                  </c>
                  <c ca="center">
                     <p>-</p>
                  </c>
                  <c ca="center">
                     <p>0.37 &#177; 0.10</p>
                  </c>
                  <c ca="center">
                     <p>0.40 &#177; 0.10</p>
                  </c>
                  <c ca="center">
                     <p>0.37 &#177; 0.12</p>
                  </c>
                  <c ca="center">
                     <p>0.34 &#177; 0.12</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Overall error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.17 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.75 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.19 &#177; 0.00</p>
                  </c>
                  <c ca="center">
                     <p>0.16 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.16 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.16 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.16 &#177; 0.02</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Effect error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.38 &#177; 0.14</p>
                  </c>
                  <c ca="center">
                     <p>0.80 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>-</p>
                  </c>
                  <c ca="center">
                     <p>0.30 &#177; 0.13</p>
                  </c>
                  <c ca="center">
                     <p>0.34 &#177; 0.11</p>
                  </c>
                  <c ca="center">
                     <p>0.32 &#177; 0.13</p>
                  </c>
                  <c ca="center">
                     <p>0.33 &#177; 0.14</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>No effect error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.10 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.05 &#177; 0.08</p>
                  </c>
                  <c ca="center">
                     <p>0.19 &#177; 0.00</p>
                  </c>
                  <c ca="center">
                     <p>0.15 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.14 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.15 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.15 &#177; 0.02</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Sensitivity</p>
                  </c>
                  <c ca="center">
                     <p>0.55 &#177; 0.19</p>
                  </c>
                  <c ca="center">
                     <p>0.98 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.00 &#177; 0.00</p>
                  </c>
                  <c ca="center">
                     <p>0.29 &#177; 0.10</p>
                  </c>
                  <c ca="center">
                     <p>0.36 &#177; 0.09</p>
                  </c>
                  <c ca="center">
                     <p>0.30 &#177; 0.09</p>
                  </c>
                  <c ca="center">
                     <p>0.26 &#177; 0.09</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Specificity</p>
                  </c>
                  <c ca="center">
                     <p>0.90 &#177; 0.07</p>
                  </c>
                  <c ca="center">
                     <p>0.07 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>1.00 &#177; 1.00</p>
                  </c>
                  <c ca="center">
                     <p>0.97 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.95 &#177; 0.02</p>
                  </c>
                  <c ca="center">
                     <p>0.97 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.97 &#177; 0.01</p>
                  </c>
               </r>
               <r>
                  <c cspan="9">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Train: lac rep</p>
                  </c>
                  <c ca="left">
                     <p>AUC</p>
                  </c>
                  <c ca="center">
                     <p>0.72</p>
                  </c>
                  <c ca="center">
                     <p>0.43</p>
                  </c>
                  <c ca="center">
                     <p>0.68</p>
                  </c>
                  <c ca="center">
                     <p>0.70</p>
                  </c>
                  <c ca="center">
                     <p>0.77</p>
                  </c>
                  <c ca="center">
                     <p>0.57</p>
                  </c>
                  <c ca="center">
                     <p>0.75</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>MCC</p>
                  </c>
                  <c ca="center">
                     <p>0.30</p>
                  </c>
                  <c ca="center">
                     <p>-</p>
                  </c>
                  <c ca="center">
                     <p>0.23</p>
                  </c>
                  <c ca="center">
                     <p>0.21</p>
                  </c>
                  <c ca="center">
                     <p>0.36</p>
                  </c>
                  <c ca="center">
                     <p>0.34</p>
                  </c>
                  <c ca="center">
                     <p>0.35</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Overall error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.17</p>
                  </c>
                  <c ca="center">
                     <p>0.19</p>
                  </c>
                  <c ca="center">
                     <p>0.27</p>
                  </c>
                  <c ca="center">
                     <p>0.21</p>
                  </c>
                  <c ca="center">
                     <p>0.17</p>
                  </c>
                  <c ca="center">
                     <p>0.17</p>
                  </c>
                  <c ca="center">
                     <p>0.17</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Test: lysozyme</p>
                  </c>
                  <c ca="left">
                     <p>Effect error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.33</p>
                  </c>
                  <c ca="center">
                     <p>-</p>
                  </c>
                  <c ca="center">
                     <p>0.65</p>
                  </c>
                  <c ca="center">
                     <p>0.57</p>
                  </c>
                  <c ca="center">
                     <p>0.41</p>
                  </c>
                  <c ca="center">
                     <p>0.35</p>
                  </c>
                  <c ca="center">
                     <p>0.35</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>No effect error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.16</p>
                  </c>
                  <c ca="center">
                     <p>0.19</p>
                  </c>
                  <c ca="center">
                     <p>0.14</p>
                  </c>
                  <c ca="center">
                     <p>0.16</p>
                  </c>
                  <c ca="center">
                     <p>0.14</p>
                  </c>
                  <c ca="center">
                     <p>0.15</p>
                  </c>
                  <c ca="center">
                     <p>0.15</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Sensitivity</p>
                  </c>
                  <c ca="center">
                     <p>0.20</p>
                  </c>
                  <c ca="center">
                     <p>0.00</p>
                  </c>
                  <c ca="center">
                     <p>0.46</p>
                  </c>
                  <c ca="center">
                     <p>0.25</p>
                  </c>
                  <c ca="center">
                     <p>0.35</p>
                  </c>
                  <c ca="center">
                     <p>0.26</p>
                  </c>
                  <c ca="center">
                     <p>0.26</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Specificity</p>
                  </c>
                  <c ca="center">
                     <p>0.98</p>
                  </c>
                  <c ca="center">
                     <p>1.00</p>
                  </c>
                  <c ca="center">
                     <p>0.80</p>
                  </c>
                  <c ca="center">
                     <p>0.92</p>
                  </c>
                  <c ca="center">
                     <p>0.94</p>
                  </c>
                  <c ca="center">
                     <p>0.97</p>
                  </c>
                  <c ca="center">
                     <p>0.97</p>
                  </c>
               </r>
               <r>
                  <c cspan="9">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Train: lysozyme</p>
                  </c>
                  <c ca="left">
                     <p>AUC</p>
                  </c>
                  <c ca="center">
                     <p>0.79</p>
                  </c>
                  <c ca="center">
                     <p>0.44</p>
                  </c>
                  <c ca="center">
                     <p>0.65</p>
                  </c>
                  <c ca="center">
                     <p>0.58</p>
                  </c>
                  <c ca="center">
                     <p>0.78</p>
                  </c>
                  <c ca="center">
                     <p>0.66</p>
                  </c>
                  <c ca="center">
                     <p>0.78</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>MCC</p>
                  </c>
                  <c ca="center">
                     <p>0.41</p>
                  </c>
                  <c ca="center">
                     <p>-0.11</p>
                  </c>
                  <c ca="center">
                     <p>0.32</p>
                  </c>
                  <c ca="center">
                     <p>0.06</p>
                  </c>
                  <c ca="center">
                     <p>0.42</p>
                  </c>
                  <c ca="center">
                     <p>0.40</p>
                  </c>
                  <c ca="center">
                     <p>0.41</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Overall error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.20</p>
                  </c>
                  <c ca="center">
                     <p>0.39</p>
                  </c>
                  <c ca="center">
                     <p>0.24</p>
                  </c>
                  <c ca="center">
                     <p>0.25</p>
                  </c>
                  <c ca="center">
                     <p>0.20</p>
                  </c>
                  <c ca="center">
                     <p>0.20</p>
                  </c>
                  <c ca="center">
                     <p>0.20</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Test: lac rep</p>
                  </c>
                  <c ca="left">
                     <p>Effect error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.22</p>
                  </c>
                  <c ca="center">
                     <p>0.84</p>
                  </c>
                  <c ca="center">
                     <p>0.46</p>
                  </c>
                  <c ca="center">
                     <p>0.30</p>
                  </c>
                  <c ca="center">
                     <p>0.26</p>
                  </c>
                  <c ca="center">
                     <p>0.23</p>
                  </c>
                  <c ca="center">
                     <p>0.23</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>No effect error rate</p>
                  </c>
                  <c ca="center">
                     <p>0.19</p>
                  </c>
                  <c ca="center">
                     <p>0.28</p>
                  </c>
                  <c ca="center">
                     <p>0.19</p>
                  </c>
                  <c ca="center">
                     <p>0.25</p>
                  </c>
                  <c ca="center">
                     <p>0.19</p>
                  </c>
                  <c ca="center">
                     <p>0.20</p>
                  </c>
                  <c ca="center">
                     <p>0.19</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Sensitivity</p>
                  </c>
                  <c ca="center">
                     <p>0.32</p>
                  </c>
                  <c ca="center">
                     <p>0.13</p>
                  </c>
                  <c ca="center">
                     <p>0.40</p>
                  </c>
                  <c ca="center">
                     <p>0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.35</p>
                  </c>
                  <c ca="center">
                     <p>0.30</p>
                  </c>
                  <c ca="center">
                     <p>0.33</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Specificity</p>
                  </c>
                  <c ca="center">
                     <p>0.97</p>
                  </c>
                  <c ca="center">
                     <p>0.78</p>
                  </c>
                  <c ca="center">
                     <p>0.88</p>
                  </c>
                  <c ca="center">
                     <p>1.00</p>
                  </c>
                  <c ca="center">
                     <p>0.96</p>
                  </c>
                  <c ca="center">
                     <p>0.97</p>
                  </c>
                  <c ca="center">
                     <p>0.97</p>
                  </c>
               </r>
            </tblbdy>
            <tblfn>
               <p>See Table 2 for column details. Note that MCC score or effect rate cannot be shown if all mutations are predicted as 'no effect'.</p>
            </tblfn>
         </tbl>
         <sec>
            <st>
               <p>Na&#239;ve Bayes classifier</p>
            </st>
            <sec>
               <st>
                  <p>all:all</p>
               </st>
               <p>As expected, overall error rates of less than 20% were achieved in all cross validation tests with the <it>all</it>:<it>all </it>model (Table <tblr tid="T2">2</tblr>, column 1). These results are consistent with previous studies reporting accuracies of 70 &#8211; 80% on similar datasets using similar variables <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B10">10</abbr><abbr bid="B17">17</abbr></abbrgrp>. Furthermore, all AUC values (Area under ROC curve &#8211; see Evaluation measures in Methods section for details of all performance metrics), including those from heterogeneous cross validation were at least 0.80 indicating a robust classifier despite the na&#239;vety of the network structure. We therefore used results on the <it>all</it>:<it>all </it>model as a benchmark for the six other experiments.</p>
            </sec>
            <sec>
               <st>
                  <p>Missing structural information (all:noS and noS:noS)</p>
               </st>
               <p>Performance dropped significantly with a 6 node network utilising only evolutionary information (<it>noS</it>:<it>noS</it>, Table <tblr tid="T2">2</tblr>, Column 3), with most AUC values reduced by over 10% from the <it>all</it>:<it>all </it>model. In particular, with homogeneous cross validation on lysozyme data AUC value decreased from 0.83 to 0.68, and MCC value was as low as 0.23. Even when structural information was used in training the network (<it>all</it>:<it>noS</it>, Table <tblr tid="T2">2</tblr>, Column 2), results were not improved possibly because variables are treated as independent in a na&#239;ve structure and so variables with missing values have little influence when they are marginalised over.</p>
            </sec>
            <sec>
               <st>
                  <p>Missing evolutionary information (all:noE and noE:noE)</p>
               </st>
               <p>In contrast to results achieved without structural information, there was little or no effect on performance when evolutionary information was either missing during testing (<it>all</it>:<it>noE</it>, Table <tblr tid="T2">2</tblr>, Column 4) or missing during both training and testing (<it>noE</it>:<it>noE</it>, Table <tblr tid="T2">2</tblr>, Column 5). Again, due to the na&#239;vety of the structure, similar results were achieved by the <it>all</it>:<it>noE </it>and <it>noE</it>:<it>noE </it>models with AUC values of around 0.80 and overall error rates below 0.20.</p>
               <p>Overall, results suggest that structural information is more important than evolutionary information in predicting the functional consequences of a missense mutation in both lac repressor and T4 lysozyme, for the dataset used. Indeed, although evolutionary information has some predictive power, utilising only structural information appears to be sufficient for accurate prediction, comparable to that of the <it>all</it>:<it>all </it>model.</p>
            </sec>
            <sec>
               <st>
                  <p>A note on structural flexibility</p>
               </st>
               <p>It has previously been suggested that the B-factor and neighbourhood B-factor of the native amino acid are the most important predictors of functional effects of SNPs <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. However, the need to use B-factor information limits a method to structures from X-ray crystallography; such information is not available for NMR structures (although these do have their own internal flexibility measures). We found that removing the <it>bf</it> and <it>nbf</it> nodes from the all node network made little significant difference to overall performance with AUCs ranging from 0.80 to 0.83 in homogeneous cross-validation and 0.78 and 0.82 in heterogeneous cross-validation (results not in Table). This suggests that accurate prediction is possible without using structural flexibility information, although that is not to say that structural flexibility is not important, rather, other variables have compensated effectively for its loss.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Learned structure</p>
            </st>
            <p>Using both the Bayesian and BIC scoring functions employed by the greedy search algorithm we learned structures from lac repressor and lysozyme datasets separately and the two datasets combined ('mixed'). As with the na&#239;ve Bayes classifier, we evaluated each structure using both homogeneous ten-fold and heterogeneous cross-validation. There was little significant difference in performance between the two scoring functions, or between structures learned on different datasets. The main difference was in the number of edges in the resulting DAGs. For our mixed dataset, there were 35 edges with BIC, and 48 with full Bayesian scoring. Using <it>Occam's Razor</it>, we prefer the simplest of equally good models, and take the Bayesian network structure learned from the mixed dataset, using the BIC scoring function, as our model structure <m:math name="1471-2105-7-405-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math>, which is illustrated in Figure <figr fid="F2">2</figr>. With a harsher penalty for extra edges, the DAG learned using the BIC scoring function, should contain edges which are more likely to be biologically meaningful. It is important to note that the Bayesian networks with learned structure (or structure determined from conditional independence relations identified by an expert) capture the relationships between all the variables, and are not designed solely to discriminate for classification of a single variable based on the other variables. This is a significant advantage of the Bayesian network: we can infer the value of any variable(s) based on the value of any other variable, so we have constructed a model which can not only predict effect/no effect, but can infer the value of any of the attributes.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Learned Bayesian network structure <m:math name="1471-2105-7-405-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math></p>
               </caption>
               <text>
                  <p><b>Learned Bayesian network structure </b><m:math name="1471-2105-7-405-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math>. Learned Bayesian network structure <m:math name="1471-2105-7-405-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math> (using greedy search with BIC scoring function from mixed dataset). Key to node labels is shown in Table 1.</p>
               </text>
               <graphic file="1471-2105-7-405-2"/>
            </fig>
            <sec>
               <st>
                  <p>all:all</p>
               </st>
               <p>Little significant improvement in homogeneous cross validation performance was gained from using structure <m:math name="1471-2105-7-405-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math> (Table <tblr tid="T3">3</tblr>, column 1) over the simple na&#239;ve structure (Table <tblr tid="T2">2</tblr>, column 1). This was because the na&#239;ve structure is specifically designed for classification, whereas our learned structure is the 'best' structure for capturing the relationships between all of the variables. The learned structure <m:math name="1471-2105-7-405-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math> performs as well in classification of <it>effect</it> as the na&#239;ve structure, but has the added advantage that it can be used to predict the values of any of the variables, from any of the other variables.</p>
               <p>Structure <m:math name="1471-2105-7-405-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math> appeared to perform worse than the na&#239;ve structure during heterogeneous cross-validation, especially when trained on lac repressor and tested on lysozyme data. Here, AUC decreased from 0.80 to 0.72 despite lower effect error rates at the selected threshold (0.33 <it>vs </it>0.52). The low AUC value of 0.72 may be deceptive since a significant number of points on the ROC curve lie below the convex hull (Figure <figr fid="F3">3</figr>) and as such are non-optimal classifiers <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. Therefore, measuring the performance of a classifier which represents a single point on both the ROC curve and the convex hull (circled in Figure <figr fid="F3">3</figr>) was more useful in this case. As described in Methods, we chose the point at cost ratio 3.0 (where false positives cost three times more than false negatives) as this helps balance the 'effect' and 'no effect' misclassification error rates (important in datasets such as ours that are biased towards negative examples). At this selected threshold, overall error (0.17) and effect error rate (0.33) were lower for structure <m:math name="1471-2105-7-405-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math> than the na&#239;ve structure (0.20 and 0.52 respectively). However, MCC value was also lower (0.30 <it>vs </it>0.40) and no effect error rate was higher (0.16 <it>vs </it>0.10) which illustrates the difficulty in selecting a measure to compare different models not only between different studies but within the same study as well.</p>
               <fig id="F3">
                  <title>
                     <p>Figure 3</p>
                  </title>
                  <caption>
                     <p>ROC curve for learned structure <m:math name="1471-2105-7-405-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math></p>
                  </caption>
                  <text>
                     <p><b>ROC curve for learned structure </b><m:math name="1471-2105-7-405-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math>. ROC curve for learned structure <m:math name="1471-2105-7-405-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math> trained on lac repressor, tested on lysozyme. The blue line is the ROC curve. The red line is the convex hull of the ROC curve. The circled point which lies on both of these curves is the classifier with the selected threshold (cost ratio = 3.0).</p>
                  </text>
                  <graphic file="1471-2105-7-405-3"/>
               </fig>
            </sec>
            <sec>
               <st>
                  <p>Missing structural information (all:noS and noS:noS)</p>
               </st>
               <p>The model learned from all the variables and tested using only evolutionary information (<it>all</it>:<it>noS</it>, Table <tblr tid="T3">3</tblr>, column 2) performed poorly achieving AUC values less than 0.50 (worse than random) and error rates above 0.75. Given the number of connections in <m:math name="1471-2105-7-405-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math> and the potential for inferring the missing structural information in the test data, the <it>all</it>:<it>noS </it>model was surprisingly worse than the <it>noS</it>:<it>noS </it>model (Table <tblr tid="T3">3</tblr>, column 3).</p>
               <p>There could be a number of reasons for the poor performance of the <it>all</it>:<it>noS </it>model. The model may have learned during training that structural information is more important to prediction than evolutionary information. Consequently, without structural information during testing, the model has problems since it has down-weighted the evolutionary nodes relative to the structural nodes. Alternatively, it may not be possible to accurately infer values at the structural nodes from evolutionary information. By contrast, it is essential that the <it>noS</it>:<it>noS </it>model makes full use of the evolutionary information since structural information is unavailable in both training and testing. Even though cross validation results with <it>noS</it>:<it>noS </it>were worse than the <it>all</it>:<it>all </it>model with AUC values ranging from 0.65 &#8211; 0.73 and overall error rates up to 0.27, they were better than the <it>all</it>:<it>noS </it>since full weight is given to the evolutionary nodes.</p>
            </sec>
            <sec>
               <st>
                  <p>Missing evolutionary information (all:noE and noE:noE)</p>
               </st>
               <p>When marginalising over unknown evolutionary variables (<it>all</it>:<it>noE</it>, Table <tblr tid="T3">3</tblr>, column 4), predictions in most cases were comparable to the <it>all</it>:<it>all </it>model. However, poor results were achieved during homogeneous cross validation on mixed data and heterogeneous cross validation, especially training on lysozyme and testing on lac repressor data (AUC 0.58). In these cases, it appears that values at the evolutionary nodes with missing data could not be predicted accurately from the structural information during testing thus confusing the model. As expected, the <it>noE</it>:<it>noE </it>model trained and tested using structural variables only performed as well as the all:all model across all cross validation tests (Table <tblr tid="T3">3</tblr>, column 5).</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Tolerance to incomplete training data</p>
            </st>
            <p>Bayesian networks are capable of learning model parameters from incomplete data. Here we test the tolerance of the Bayesian networks by training on incomplete data. In every training example, we hide <it>n </it>nodes (chosen randomly for each training case). We do this for the na&#239;ve Bayes classifier, and the learned structure <m:math name="1471-2105-7-405-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math>, and vary <it>n </it>from 0 to 14. The CPTs are learned using the iterative EM algorithm on the missing values. Figure <figr fid="F4">4</figr> shows the results of homogeneous cross-validation when trained on incomplete data from the 'mixed' dataset, and tested when all nodes are observed. Note that using this method, different sets of <it>n </it>nodes are chosen to have missing data between different training cases, therefore here we were testing the general ability of the Bayesian network to tolerate incomplete data rather than the effect of when certain nodes were missing data in all examples (as in the previous section).</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Classifier performance</p>
               </caption>
               <text>
                  <p><b>Classifier performance</b>. Performance of na&#239;ve Bayes classifier and structure <m:math name="1471-2105-7-405-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math> with parameters learned from incomplete data. The AUC (area under the ROC curve) is plotted against the number of nodes (n) randomly chosen to have missing data within the test examples.</p>
               </text>
               <graphic file="1471-2105-7-405-4"/>
            </fig>
            <p>Figure <figr fid="F4">4</figr> shows that the performance of both the na&#239;ve and <m:math name="1471-2105-7-405-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math> structures (measured by AUC value) were robust to incomplete training data, with an area under the ROC curve of over 0.80 maintained even when nine of the fifteen nodes were not observed in every example. With very sparse data (more than 9 nodes hidden), the na&#239;ve Bayes classifier performed better than the learned structure. This was probably because the conditional probability tables (CPTs) of the na&#239;ve structure only model the relationship of <it>effect</it> with each variable, whereas the CPTs of <m:math name="1471-2105-7-405-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math> depend on the relationship between multiple variables. From Figure <figr fid="F2">2</figr>, we can see that a number of nodes are dependent on three variables in <m:math name="1471-2105-7-405-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math>, which perhaps explains the performance decrease when the model is not trained on sets of four or more variables. For example, when 11 variables are missing, an AUC value of 0.73 is achieved by <m:math name="1471-2105-7-405-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math>, whereas when 12 variables are missing, performance decreases to that of random classifier (AUC &lt; 0.5). Nevertheless, overall tolerance to incomplete training data by both Bayesian networks was encouraging considering the potential sparsity of either evolutionary and structural information for a significant number of proteins, especially structural genomics targets. Other machine learning methods such as SVMs or decision trees are unable to handle incomplete data in this way.</p>
         </sec>
         <sec>
            <st>
               <p>Training set size</p>
            </st>
            <p>In order to assess how much data is needed for training the Bayesian networks, sequential learning of the model parameters was performed. The 'mixed' dataset was divided into two. One half was used as the test validation set, and the Bayesian networks were trained on the other half. Figure <figr fid="F5">5</figr> shows a plot of training set size vs. classifier performance, measured using area under the ROC curve. The result is as expected. The na&#239;ve model (with its 43 parameters) gradually improves its performance as its parameters are sequentially learned, with excellent performance after 400 examples (and good after as few as 50). The learned BN structure <m:math name="1471-2105-7-405-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math> has 182 free parameters and it out performs the na&#239;ve classifier after 1000 training examples.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Training set size</p>
               </caption>
               <text>
                  <p><b>Training set size</b>. Performance of na&#239;ve Bayes classifier and structure <m:math name="1471-2105-7-405-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math> with parameters learned sequentially. The AUC (area under the ROC curve) is plotted against the number of training examples.</p>
               </text>
               <graphic file="1471-2105-7-405-5"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Interpreting the structures</p>
            </st>
            <p>The learned structure <m:math name="1471-2105-7-405-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math> is one of many Markov equivalent structures which could have been learned from this data. There are also many other network structures which could suitably encode the relationships between the variables. Using Markov chain Monte Carlo (MCMC) methods, we constructed a set of 'good' model structures, and averaged over these models to form a posterior distribution of model structures. Figure <figr fid="F6">6</figr> shows a plot of the frequency of connections made in the set of 'good' structures from ten runs of the MCMC simulation over 10000 samples, after a 'burnin' of 1000 samples. The darker squares indicate a higher observed frequency of an edge connecting each pair of nodes. From this, one can identify strong relationships between highly correlated variables.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Posterior distribution of relationships</p>
               </caption>
               <text>
                  <p><b>Posterior distribution of relationships</b>. Strength of relationships between variables, identified through analysis of edges connecting pairs of nodes in MCMC structure learning. A dark square indicates a strong relationship; a white square a weak relationship.</p>
               </text>
               <graphic file="1471-2105-7-405-6"/>
            </fig>
            <p>The use of MCMC methods to study the posterior distribution over networks has the advantage of revealing relationships between the input variables. For instance, in Figure <figr fid="F6">6</figr>, the top row shows which variables are most strongly related to <it>effect</it>, and this is used later to develop simplified classifiers.</p>
            <p>However, biologically meaningful relationships between the other variables are also revealed. With the exception of the trivial relationship between <it>ac</it> and <it>rac</it>, the second row of Figure <figr fid="F6">6</figr> shows strong links between <it>ac</it>, <it>nrent</it>, and <it>rent</it>, reflecting a well known biochemical relationship between solvent accessibility of residues and phylogenetic variability: the solvent exposed surface loops of protein structures show greater evolutionary variability than the unexposed hydrophobic core residues. Similar effects that concur with known protein chemistry relate measures of flexibility (<it>nbf</it>, <it>bf</it>) to phylogenetic variability. Equally understandable are the strong link between G and P residues in turns (<it>trn</it>) and evolutionary conservation at the specific sequence position of G/P (indicated by <it>rent</it>, <it>cnsd</it>) but not to a neighbourhood measure (<it>nrent</it>, <it>ncnsd</it>), and the relationship between protein interface positions (<it>ifc</it>) and neighbourhood flexibility measures (<it>nbf</it>).</p>
            <p>From Figure <figr fid="F6">6</figr>, one can see that the <it>effect</it> node has the strongest relationships with <it>bur</it>, <it>nbf</it>, <it>ac</it>, <it>uslaa</it>, and <it>uslby</it> (in descending order). There are very few direct connections between <it>effect</it> and <it>trn</it>, <it>hlx</it>, <it>ifc</it>, <it>cnsd</it>, and <it>ncnsd</it>. As expected, nodes such as <it>bf</it> and <it>nbf</it>, and <it>rent</it> and <it>nrent</it> are highly correlated, which suggests some redundancy within the network and one node could be used to predict the value of the other. Both structural and evolutionary information are represented by the nodes most frequently directly connected to <it>effect</it>, although the top three most common nodes, <it>bur</it>, <it>nbf</it> and <it>ac</it>, represent only structural information. This, together with the strong performance of the Bayes classifier without evolutionary information (Table <tblr tid="T2">2</tblr>, columns 4 and 5), suggests that evolutionary properties of the mutated residue have little direct influence on prediction if structural information is present.</p>
            <p>Our finding that solvent accessible area of the native amino acid, whether the amino acid is charged at a buried site, and the flexibility of its structural neighbourhood are all important predictors of effect agrees to some extent with Chasman and Adams (2001) who found that structure based accessibility and B-factor features have the most discriminatory power. The strong performance of accessibility measures probably reflects the finding of <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> and <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> that mutations affecting the hydrophobic core are more likely to destabilise the protein and thus affect function. Perhaps mutations on the surface are more likely affect function if they are conserved, as suggested by the strong relationship between accessibility and phylogenetic entropy (<it>ac</it> with <it>rent</it> and <it>nrent</it>). Conversely, whether or not the mutation breaks either a helix or turn does not appear to be critical to predicting effect although, again, secondary structure information may become more powerful when used in conjunction with other features.</p>
         </sec>
         <sec>
            <st>
               <p>A simplified Bayesian network</p>
            </st>
            <p>Whilst the nodes directly connected to the <it>effect</it> node are not essential to prediction if certain other nodes are present (as demonstrated by the removal of the structural flexibility nodes <it>nbf</it> and <it>bf</it>, with no significant loss of performance), in theory, the value of the <it>effect</it> node can be predicted using only the nodes which are directly connected to it in the learned structures. The other variables become d-separated from <it>effect</it>; i.e. with a structure, and the conditional independence relations it implies, the effect node is conditionally dependent on only the nodes it is connected to when they are observed.</p>
            <p>We tested this hypothesis by constructing two simple four node networks: a na&#239;ve structure (Figure <figr fid="F7">7a</figr>), and structure <m:math name="1471-2105-7-405-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math><sub>3 </sub>(Figure <figr fid="F7">7b</figr>) learned using the greedy search algorithm and the BIC scoring function as above. These networks consisted of only the three nodes, <it>bur</it>, <it>nbf</it> and <it>ac</it>, with the strongest relationships with <it>effect</it> as shown in Figure <figr fid="F6">6</figr>.</p>
            <fig id="F7">
               <title>
                  <p>Figure 7</p>
               </title>
               <caption>
                  <p>Simplified Bayesian networks</p>
               </caption>
               <text>
                  <p><b>Simplified Bayesian networks</b>. Four node networks using the three key variables shown to have the strongest relationship with the effect. (a) Na&#239;ve Bayes classifier, (b) learned Bayesian network structure <m:math name="1471-2105-7-405-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math><sub>3</sub>.</p>
               </text>
               <graphic file="1471-2105-7-405-7"/>
            </fig>
            <p>Across all cross-validation tests, the four node na&#239;ve Bayes classifier trained and tested using only the three key variables (<it>key</it>:<it>key</it>, Table <tblr tid="T2">2</tblr>, final column) performed extremely well with only a minor decrease in performance over the <it>all</it>:<it>all </it>model. In homogeneous cross validation, AUC values ranged from 0.77 to 0.80 and the maximum overall error rate was just 0.21. In heterogeneous cross validation tests, the AUC also remained high (0.77 and 0.79) with overall error rates as low as 0.16 for training on lac repressor and testing on lysozyme data. There were no significant differences between the performance of the four node learned structure (<it>key</it>:<it>key</it>, Table <tblr tid="T3">3</tblr>, final column) and that of the na&#239;ve structure, which suggests little value in the connections between variables.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>We have applied Bayesian networks to the task of predicting whether or not missense mutations will affect protein function with comparable performance to other machine learning methods. However, the strength of the Bayesian network lies in its ability to handle incomplete data and to encode relationships between variables; both of which were exploited here to derive some biological insight into how a missense mutation affects protein function.</p>
         <p>A number of models were learned in this work. Due to the unbalanced datasets we analysed ROC curves and selected a suitable cost ratio in order to choose a probability threshold for the classifiers. This allowed us to compare classifiers in a meaningful way. From this analysis we concluded that a na&#239;ve network structure is sufficient for accurate prediction of the effect of a missense mutation with AUC values around 0.80. We also found that the structural environment of the amino acid is a far better predictor of the functional consequences of a missense mutation than phylogenetic information. This was demonstrated by the more accurate performance of a na&#239;ve classifier that just uses structural information compared to that which uses just evolutionary information. There were no significant performance gains when using a learned network structure, however the learned structure did allow relationships between variables to be analysed, in particular by analysing the posterior distribution of model structures, we found the top three strongest connections with the effect node all involved structural nodes. With this in mind, we derived a simplified Bayesian network that used just these three structural descriptors (solvent accessible area of the native amino acid, whether the amino acid is charged at a buried site, and the flexibility of its structural neighbourhood) without significant decrease in performance. Given the importance of structure, it would be interesting to learn if certain amino acid changes are more predictive of effect than others. For example, both cysteine, which forms disulphide bridges, and proline, with its unique ring structure, are often critical to the integrity of a protein structure so one would expect a mutation involving either of these residues to change the structure significantly. This will provide the basis for future work.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Evaluation measures</p>
            </st>
            <p>A number of measures were applied to evaluate each classifier: error rates (fraction of mis-classified examples), sensitivity (true positive rate) and specificity (true negative rate). We also used Matthews' correlation coefficient (MCC), which is a correlation measure designed for comparison of unbalanced datasets such as ours. A value of +1 indicates perfect classification, and -1 indicates misclassification of every example. The MCC is defined as:</p>
            <p>
               <m:math name="1471-2105-7-405-i8" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mi>M</m:mi>
                        <m:mi>C</m:mi>
                        <m:mi>C</m:mi>
                        <m:mo>=</m:mo>
                        <m:mfrac>
                           <m:mrow>
                              <m:mo stretchy="false">(</m:mo>
                              <m:mi>T</m:mi>
                              <m:mi>P</m:mi>
                              <m:mo>&#215;</m:mo>
                              <m:mi>T</m:mi>
                              <m:mi>N</m:mi>
                              <m:mo stretchy="false">)</m:mo>
                              <m:mo>&#8722;</m:mo>
                              <m:mo stretchy="false">(</m:mo>
                              <m:mi>F</m:mi>
                              <m:mi>P</m:mi>
                              <m:mo>&#215;</m:mo>
                              <m:mi>F</m:mi>
                              <m:mi>N</m:mi>
                              <m:mo stretchy="false">)</m:mo>
                           </m:mrow>
                           <m:mrow>
                              <m:msqrt>
                                 <m:mrow>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>T</m:mi>
                                    <m:mi>P</m:mi>
                                    <m:mo>+</m:mo>
                                    <m:mi>F</m:mi>
                                    <m:mi>P</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>T</m:mi>
                                    <m:mi>P</m:mi>
                                    <m:mo>+</m:mo>
                                    <m:mi>F</m:mi>
                                    <m:mi>N</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>T</m:mi>
                                    <m:mi>N</m:mi>
                                    <m:mo>+</m:mo>
                                    <m:mi>F</m:mi>
                                    <m:mi>P</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>T</m:mi>
                                    <m:mi>N</m:mi>
                                    <m:mo>+</m:mo>
                                    <m:mi>F</m:mi>
                                    <m:mi>N</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                 </m:mrow>
                              </m:msqrt>
                           </m:mrow>
                        </m:mfrac>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGnbqtcqWGdbWqcqWGdbWqcqGH9aqpdaWcaaqaaiabcIcaOiabdsfaujabdcfaqjabgEna0kabdsfaujabd6eaojabcMcaPiabgkHiTiabcIcaOiabdAeagjabdcfaqjabgEna0kabdAeagjabd6eaojabcMcaPaqaamaakaaabaGaeiikaGIaemivaqLaemiuaaLaey4kaSIaemOrayKaemiuaaLaeiykaKIaeiikaGIaemivaqLaemiuaaLaey4kaSIaemOrayKaemOta4KaeiykaKIaeiikaGIaemivaqLaemOta4Kaey4kaSIaemOrayKaemiuaaLaeiykaKIaeiikaGIaemivaqLaemOta4Kaey4kaSIaemOrayKaemOta4KaeiykaKcaleqaaaaaaaa@5F65@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>where TP are true positives, TN are true negatives, FP are false positives, and FN are false negatives obtained from evaluating the classifier on the test data.</p>
            <p>Since we have a Bayesian network classifier, with a probability associated with each classification, the metrics above depend on the value of the classification threshold <it>p </it>that is used. To assess performance across a range of values of the probability threshold we plotted a receiver operator characteristic (ROC) curve. The ROC curve is a plot of the sensitivity versus (1-specificity) for all feasible ratios of costs associated with misclassification errors (equivalent to plotting true positive rate versus false positive rate). The area under the curve (AUC) is a measure of the performance of a binary classifier. A classifier no better than random gives an AUC of 0.5, a perfect classifier gives an AUC of 1.0.</p>
         </sec>
         <sec>
            <st>
               <p>Choosing a classification threshold</p>
            </st>
            <p>In order to perform a fair comparison of classifiers, we choose the classification threshold <it>p</it>, represented by a point on the ROC curve for which the curve has a gradient (&#916;TPrate/&#916;FPrate) of <it>C</it><sub><it>FP</it></sub>/<it>C</it><sub><it>FN </it></sub>&#8211; the ratio of costs between False Positives and False Negatives, and which is closest to the point (0, 1). In this work we use a cost ratio of 3.0, due to the unbalanced nature of the datasets containing 3742 mutations which have no effect on protein function and 1135 which do effect protein function. This is close to 3:1, and by weighting the cost of a false positive, <it>C</it><sub><it>FP</it></sub>, as three times more costly (to the classifier) than a false negative, <it>C</it><sub><it>FN</it></sub>, we obtain a classifier with reasonably well balanced error rates. This means the classifier is less likely to predict everything as an effect (than with an equal cost ratio of 1.0) and make many false positive errors, which would give a high effect error rate. (Without doing this, we may be comparing classifiers with very different properties. i.e. ones with quite different specificities and sensitivities). The method is applied to the ROC curves obtained from the probabilistic classification scheme and we present the results for the classification threshold <it>p </it>corresponding to the point on the convex hull of the ROC curve where the gradient is closest 3.0. (We choose a point on the convex hull since any point on the ROC curve not on the convex hull is a non-optimal classifier <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>).</p>
         </sec>
         <sec>
            <st>
               <p>Data discretization</p>
            </st>
            <p>There were a number of challenges buried in these data. Continuous data was non-Gaussian, making it unsuitable for modelling as a continuous Gaussian node in a Bayesian network. There were also no obvious boundaries at which to separate the data into discrete categories. Our solution was to fit a number of Gaussians to the data using an Expectation-Maximisation based algorithm that automatically chooses the number of classes. It begins with one Gaussian, and iteratively splits the Gaussian with the largest weight, until adding extra classes does not increase the maximum likelihood of the model. Full details are provided below. This allowed us to form discrete classes from continuous data, which gave better performance than simply splitting the data into three classes of equal range (results not shown). We therefore used this strategy in all our analyses.</p>
            <sec>
               <st>
                  <p>The E-M algorithm</p>
               </st>
               <p>The Expectation Maximisation algorithm is a well established efficient algorithm for fitting Gaussian mixture models to data. The main draw-back of the algorithm is its sensitivity to initialisation, and the need for multiple runs with different numbers of mixtures in order to choose the maximum likelihood model. Here we present a adaptation to the method which is deterministic and automatically chooses the number of Gaussians. It begins with one Gaussian, and iteratively splits the Gaussian with the largest weight, until adding extra mixtures does not increase the maximum likelihood of the model. Given a data set <b>X </b>= {x<sub>1</sub>,..., x<sub><it>N</it></sub>} of <it>N </it>cases of <it>d</it>-dimensional data, a single cluster <b><it>&#956;</it></b><sub>1 </sub>is initialised at the mean of the data. A Gaussian with diagonal covariance equal to the standard deviation of the data in each dimension is placed at <b><it>&#956;</it></b><sub>1 </sub>The weight of this cluster is set to one <it>p</it>(<it>j </it>= 1) = 1.</p>
               <p>
                  <m:math name="1471-2105-7-405-i9" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mtable>
                              <m:mtr>
                                 <m:mtd>
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>&#956;</m:mi>
                                          <m:mn>1</m:mn>
                                       </m:msub>
                                       <m:mo>=</m:mo>
                                       <m:mfrac>
                                          <m:mn>1</m:mn>
                                          <m:mi>N</m:mi>
                                       </m:mfrac>
                                       <m:mstyle displaystyle="true">
                                          <m:msubsup>
                                             <m:mo>&#8721;</m:mo>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mo>=</m:mo>
                                                <m:mn>1</m:mn>
                                             </m:mrow>
                                             <m:mi>N</m:mi>
                                          </m:msubsup>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mtext>x</m:mtext>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                          </m:mrow>
                                       </m:mstyle>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                              <m:mtr>
                                 <m:mtd>
                                    <m:mrow>
                                       <m:msub>
                                          <m:mtext>s</m:mtext>
                                          <m:mn>1</m:mn>
                                       </m:msub>
                                       <m:mo>=</m:mo>
                                       <m:mfrac>
                                          <m:mn>1</m:mn>
                                          <m:mi>N</m:mi>
                                       </m:mfrac>
                                       <m:mstyle displaystyle="true">
                                          <m:msubsup>
                                             <m:mo>&#8721;</m:mo>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mo>=</m:mo>
                                                <m:mn>1</m:mn>
                                             </m:mrow>
                                             <m:mi>N</m:mi>
                                          </m:msubsup>
                                          <m:mrow>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:msub>
                                                <m:mtext>x</m:mtext>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                             <m:mo>&#8722;</m:mo>
                                             <m:msub>
                                                <m:mi>&#956;</m:mi>
                                                <m:mn>1</m:mn>
                                             </m:msub>
                                             <m:mo stretchy="false">)</m:mo>
                                             <m:msup>
                                                <m:mrow>
                                                   <m:mo stretchy="false">(</m:mo>
                                                   <m:msub>
                                                      <m:mtext>x</m:mtext>
                                                      <m:mi>i</m:mi>
                                                   </m:msub>
                                                   <m:mo>&#8722;</m:mo>
                                                   <m:msub>
                                                      <m:mi>&#956;</m:mi>
                                                      <m:mn>1</m:mn>
                                                   </m:msub>
                                                   <m:mo stretchy="false">)</m:mo>
                                                </m:mrow>
                                                <m:mi>T</m:mi>
                                             </m:msup>
                                          </m:mrow>
                                       </m:mstyle>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                              <m:mtr>
                                 <m:mtd>
                                    <m:mrow>
                                       <m:mi>p</m:mi>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>j</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mn>1</m:mn>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo>=</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                           </m:mtable>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqabeWabaaabaaccmGae8hVd02aaSbaaSqaaiabigdaXaqabaGccqGH9aqpdaWcaaqaaiabigdaXaqaaiabd6eaobaadaaeWaqaaiabbIha4naaBaaaleaacqWGPbqAaeqaaaqaaiabdMgaPjabg2da9iabigdaXaqaaiabd6eaobqdcqGHris5aaGcbaGaee4Cam3aaSbaaSqaaiabigdaXaqabaGccqGH9aqpdaWcaaqaaiabigdaXaqaaiabd6eaobaadaaeWaqaaiabcIcaOiabbIha4naaBaaaleaacqWGPbqAaeqaaOGaeyOeI0Iae8hVd02aaSbaaSqaaiabigdaXaqabaGccqGGPaqkcqGGOaakcqqG4baEdaWgaaWcbaGaemyAaKgabeaakiabgkHiTiab=X7aTnaaBaaaleaacqaIXaqmaeqaaOGaeiykaKYaaWbaaSqabeaacqWGubavaaaabaGaemyAaKMaeyypa0JaeGymaedabaGaemOta4eaniabggHiLdaakeaacqWGWbaCcqGGOaakcqWGQbGAcqGH9aqpcqaIXaqmcqGGPaqkcqGH9aqpcqaIXaqmaaaaaa@6332@</m:annotation>
                     </m:semantics>
                  </m:math>
               </p>
               <p>The probability <it>p</it>(<it>j</it>|x<sub><it>i</it></sub>) is calculated for each data point x<sub><it>i</it></sub>.</p>
               <p>For a set of <it>j </it>mixtures, the update equations follow. These are iteratively performed until the maximum likelihood is reached, i.e. <it>ML </it>= log&#8721;<sub><it>i</it></sub>&#8721;<sub><it>j</it></sub><it>p</it>(<it>j</it>|x<sub><it>i</it></sub>)</p>
               <p>E-step:</p>
               <p>
                  <m:math name="1471-2105-7-405-i10" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mtable columnalign="left">
                              <m:mtr columnalign="left">
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mi>p</m:mi>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>j</m:mi>
                                       <m:mo>|</m:mo>
                                       <m:msub>
                                          <m:mtext>x</m:mtext>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mo>=</m:mo>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mfrac>
                                          <m:mrow>
                                             <m:mi>p</m:mi>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>j</m:mi>
                                             <m:mo stretchy="false">)</m:mo>
                                             <m:mi>p</m:mi>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:msub>
                                                <m:mtext>x</m:mtext>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                             <m:mo>|</m:mo>
                                             <m:mi>j</m:mi>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                          <m:mrow>
                                             <m:mi>p</m:mi>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:msub>
                                                <m:mi>x</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                       </m:mfrac>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                              <m:mtr columnalign="left">
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mi>p</m:mi>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:msub>
                                          <m:mtext>x</m:mtext>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                       <m:mo>|</m:mo>
                                       <m:mi>j</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mo>=</m:mo>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mfrac>
                                          <m:mn>1</m:mn>
                                          <m:mrow>
                                             <m:msup>
                                                <m:mrow>
                                                   <m:mo stretchy="false">(</m:mo>
                                                   <m:mn>2</m:mn>
                                                   <m:mi>&#960;</m:mi>
                                                   <m:mo stretchy="false">)</m:mo>
                                                </m:mrow>
                                                <m:mrow>
                                                   <m:mfrac>
                                                      <m:mi>d</m:mi>
                                                      <m:mn>2</m:mn>
                                                   </m:mfrac>
                                                </m:mrow>
                                             </m:msup>
                                             <m:msup>
                                                <m:mrow>
                                                   <m:mrow>
                                                      <m:mo>|</m:mo>
                                                      <m:mrow>
                                                         <m:msub>
                                                            <m:mtext>s</m:mtext>
                                                            <m:mi>j</m:mi>
                                                         </m:msub>
                                                      </m:mrow>
                                                      <m:mo>|</m:mo>
                                                   </m:mrow>
                                                </m:mrow>
                                                <m:mrow>
                                                   <m:mfrac>
                                                      <m:mn>1</m:mn>
                                                      <m:mn>2</m:mn>
                                                   </m:mfrac>
                                                </m:mrow>
                                             </m:msup>
                                          </m:mrow>
                                       </m:mfrac>
                                       <m:mi>exp</m:mi>
                                       <m:mo>&#8289;</m:mo>
                                       <m:mrow>
                                          <m:mo>(</m:mo>
                                          <m:mrow>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mfrac>
                                                <m:mn>1</m:mn>
                                                <m:mn>2</m:mn>
                                             </m:mfrac>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:msub>
                                                <m:mtext>x</m:mtext>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                             <m:mo>&#8722;</m:mo>
                                             <m:msub>
                                                <m:mi>&#956;</m:mi>
                                                <m:mi>j</m:mi>
                                             </m:msub>
                                             <m:mo stretchy="false">)</m:mo>
                                             <m:msubsup>
                                                <m:mtext>s</m:mtext>
                                                <m:mi>j</m:mi>
                                                <m:mrow>
                                                   <m:mo>&#8722;</m:mo>
                                                   <m:mn>1</m:mn>
                                                </m:mrow>
                                             </m:msubsup>
                                             <m:msup>
                                                <m:mrow>
                                                   <m:mo stretchy="false">(</m:mo>
                                                   <m:msub>
                                                      <m:mtext>x</m:mtext>
                                                      <m:mi>i</m:mi>
                                                   </m:msub>
                                                   <m:mo>&#8722;</m:mo>
                                                   <m:msub>
                                                      <m:mi>&#956;</m:mi>
                                                      <m:mi>j</m:mi>
                                                   </m:msub>
                                                   <m:mo stretchy="false">)</m:mo>
                                                </m:mrow>
                                                <m:mi>T</m:mi>
                                             </m:msup>
                                          </m:mrow>
                                          <m:mo>)</m:mo>
                                       </m:mrow>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                              <m:mtr columnalign="left">
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mtext/>
                                       <m:mi>p</m:mi>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>j</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mo>=</m:mo>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mstyle displaystyle="true">
                                          <m:msubsup>
                                             <m:mo>&#8721;</m:mo>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mo>=</m:mo>
                                                <m:mn>1</m:mn>
                                             </m:mrow>
                                             <m:mi>N</m:mi>
                                          </m:msubsup>
                                          <m:mrow>
                                             <m:mi>p</m:mi>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>j</m:mi>
                                             <m:mo>|</m:mo>
                                             <m:msub>
                                                <m:mtext>x</m:mtext>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                       </m:mstyle>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                           </m:mtable>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqaaeWadaaabaGaemiCaaNaeiikaGIaemOAaOMaeiiFaWNaeeiEaG3aaSbaaSqaaiabdMgaPbqabaGccqGGPaqkaeaacqGH9aqpaeGafaGs5dGQ5dG85dGt+dGg=paalaaabaGaemiCaaNaeiikaGIaemOAaOMaeiykaKIaemiCaaNaeiikaGIaeeiEaG3aaSbaaSqaaiabdMgaPbqabaGccqGG8baFcqWGQbGAcqGGPaqkaeaacqWGWbaCcqGGOaakcqWG4baEdaWgaaWcbaGaemyAaKgabeaakiabcMcaPaaaaeaacqWGWbaCcqGGOaakcqqG4baEdaWgaaWcbaGaemyAaKgabeaakiabcYha8jabdQgaQjabcMcaPaqaaiabg2da9aqaamaalaaabaGaeGymaedabaGaeiikaGIaeGOmaidcciGae8hWdaNaeiykaKYaaWbaaSqabeaadaWcaaqaaiabdsgaKbqaaiabikdaYaaaaaGcdaabdiqaaiabbohaZnaaBaaaleaacqWGQbGAaeqaaaGccaGLhWUaayjcSdWaaWbaaSqabeaadaWcaaqaaiabigdaXaqaaiabikdaYaaaaaaaaOGagiyzauMaeiiEaGNaeiiCaa3aaeWaceaacqGHsisldaWcaaqaaiabigdaXaqaaiabikdaYaaacqGGOaakcqqG4baEdaWgaaWcbaGaemyAaKgabeaakiabgkHiTGGadiab+X7aTnaaBaaaleaacqWGQbGAaeqaaOGaeiykaKIaee4Cam3aa0baaSqaaiabdQgaQbqaaiabgkHiTiabigdaXaaakiabcIcaOiabbIha4naaBaaaleaacqWGPbqAaeqaaOGaeyOeI0Iae4hVd02aaSbaaSqaaiabdQgaQbqabaGccqGGPaqkdaahaaWcbeqaaiabdsfaubaaaOGaayjkaiaawMcaaaqaceaadiGaaCzcaiabdchaWjabcIcaOiabdQgaQjabcMcaPaqaaiabg2da9aqaamaaqadabaGaemiCaaNaeiikaGIaemOAaOMaeiiFaWNaeeiEaG3aaSbaaSqaaiabdMgaPbqabaGccqGGPaqkaSqaaiabdMgaPjabg2da9iabigdaXaqaaiabd6eaobqdcqGHris5aaaaaaa@A25C@</m:annotation>
                     </m:semantics>
                  </m:math>
               </p>
               <p>M-step:</p>
               <p>
                  <m:math name="1471-2105-7-405-i11" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mtable>
                              <m:mtr>
                                 <m:mtd>
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>&#956;</m:mi>
                                          <m:mi>j</m:mi>
                                       </m:msub>
                                    </m:mrow>
                                 </m:mtd>
                                 <m:mtd>
                                    <m:mo>=</m:mo>
                                 </m:mtd>
                                 <m:mtd>
                                    <m:mrow>
                                       <m:mfrac>
                                          <m:mn>1</m:mn>
                                          <m:mrow>
                                             <m:mi>p</m:mi>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>j</m:mi>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                       </m:mfrac>
                                       <m:mstyle displaystyle="true">
                                          <m:msubsup>
                                             <m:mo>&#8721;</m:mo>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mo>=</m:mo>
                                                <m:mn>1</m:mn>
                                             </m:mrow>
                                             <m:mi>N</m:mi>
                                          </m:msubsup>
                                          <m:mrow>
                                             <m:mi>p</m:mi>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>j</m:mi>
                                             <m:mo>|</m:mo>
                                             <m:msub>
                                                <m:mtext>x</m:mtext>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                             <m:mo stretchy="false">)</m:mo>
                                             <m:msub>
                                                <m:mtext>x</m:mtext>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                          </m:mrow>
                                       </m:mstyle>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                              <m:mtr>
                                 <m:mtd>
                                    <m:mrow>
                                       <m:mtext/>
                                       <m:msub>
                                          <m:mtext>s</m:mtext>
                                          <m:mi>j</m:mi>
                                       </m:msub>
                                    </m:mrow>
                                 </m:mtd>
                                 <m:mtd>
                                    <m:mo>=</m:mo>
                                 </m:mtd>
                                 <m:mtd>
                                    <m:mrow>
                                       <m:mfrac>
                                          <m:mn>1</m:mn>
                                          <m:mrow>
                                             <m:mi>p</m:mi>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>j</m:mi>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                       </m:mfrac>
                                       <m:mstyle displaystyle="true">
                                          <m:msubsup>
                                             <m:mo>&#8721;</m:mo>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mo>=</m:mo>
                                                <m:mn>1</m:mn>
                                             </m:mrow>
                                             <m:mi>N</m:mi>
                                          </m:msubsup>
                                          <m:mrow>
                                             <m:mi>p</m:mi>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>j</m:mi>
                                             <m:mo>|</m:mo>
                                             <m:msub>
                                                <m:mtext>x</m:mtext>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                             <m:mo stretchy="false">)</m:mo>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:msub>
                                                <m:mtext>x</m:mtext>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                             <m:mo>&#8722;</m:mo>
                                             <m:msub>
                                                <m:mi>&#956;</m:mi>
                                                <m:mi>j</m:mi>
                                             </m:msub>
                                             <m:mo stretchy="false">)</m:mo>
                                             <m:msup>
                                                <m:mrow>
                                                   <m:mo stretchy="false">(</m:mo>
                                                   <m:msub>
                                                      <m:mtext>x</m:mtext>
                                                      <m:mi>i</m:mi>
                                                   </m:msub>
                                                   <m:mo>&#8722;</m:mo>
                                                   <m:msub>
                                                      <m:mi>&#956;</m:mi>
                                                      <m:mi>j</m:mi>
                                                   </m:msub>
                                                   <m:mo stretchy="false">)</m:mo>
                                                </m:mrow>
                                                <m:mi>T</m:mi>
                                             </m:msup>
                                          </m:mrow>
                                       </m:mstyle>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                           </m:mtable>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqadeGadaaabaaccmGae8hVd02aaSbaaSqaaiabdQgaQbqabaaakeaacqGH9aqpaeaadaWcaaqaaiabigdaXaqaaiabdchaWjabcIcaOiabdQgaQjabcMcaPaaadaaeWaqaaiabdchaWjabcIcaOiabdQgaQjabcYha8jabbIha4naaBaaaleaacqWGPbqAaeqaaOGaeiykaKIaeeiEaG3aaSbaaSqaaiabdMgaPbqabaaabaGaemyAaKMaeyypa0JaeGymaedabaGaemOta4eaniabggHiLdaakeGabaajaiaaxMaacqqGZbWCdaWgaaWcbaGaemOAaOgabeaaaOqaaiabg2da9aqaamaalaaabaGaeGymaedabaGaemiCaaNaeiikaGIaemOAaOMaeiykaKcaamaaqadabaGaemiCaaNaeiikaGIaemOAaOMaeiiFaWNaeeiEaG3aaSbaaSqaaiabdMgaPbqabaGccqGGPaqkcqGGOaakcqqG4baEdaWgaaWcbaGaemyAaKgabeaakiabgkHiTiab=X7aTnaaBaaaleaacqWGQbGAaeqaaOGaeiykaKIaeiikaGIaeeiEaG3aaSbaaSqaaiabdMgaPbqabaGccqGHsislcqaH8oqBdaWgaaWcbaGaemOAaOgabeaakiabcMcaPmaaCaaaleqabaGaemivaqfaaaqaaiabdMgaPjabg2da9iabigdaXaqaaiabd6eaobqdcqGHris5aaaaaaa@75F2@</m:annotation>
                     </m:semantics>
                  </m:math>
               </p>
               <p>When the ML stops increasing, the Gaussian with the largest weight <it>p</it>(<it>j</it>) described by <b><it>&#956;</it></b><sub><it>j </it></sub>and s<sub><it>j </it></sub>is split into two Gaussians at <m:math name="1471-2105-7-405-i12" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>&#956;</m:mi><m:mi>j</m:mi><m:mo>+</m:mo></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWF8oqBdaqhaaWcbaGaemOAaOgabaGaey4kaScaaaaa@30D5@</m:annotation></m:semantics></m:math> and <m:math name="1471-2105-7-405-i13" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>&#956;</m:mi><m:mi>j</m:mi><m:mo>&#8722;</m:mo></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWF8oqBdaqhaaWcbaGaemOAaOgabaGaeyOeI0caaaaa@30E0@</m:annotation></m:semantics></m:math>, each with the same covariance s<sub><it>j</it></sub>. The new Gaussians are placed +/- a distance of the largest eigenvalue in the direction of the principle eigenvector of the covariance matrix s<sub><it>j </it></sub>from <it>&#956;</it><sub><it>j</it></sub>. (The Gaussians are then renamed). The EM steps above are performed until the maximum likelihood configuration is reached. This process is repeated until the ML score is no higher than with one less Gaussian. At each stage, the centres and covariances of the Gaussians are saved. Thus the algorithm terminates with a set of Gaussians that are at best no better than the set at the previous stage with one less Gaussian, so the saved set from the previous stage is used.</p>
               <p>Classification of each data point x<sub><it>i </it></sub>is taken as a hard classification into the most likely class given by arg<sub><it>j </it></sub>max <it>p</it>(<it>j</it>|x<sub><it>i</it></sub>).</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>CJN carried out the study, CAM assisted with data sets and result interpretation, JRB advised on biological aspects and result interpretation, AJB and DRW, suggested the study and assisted with result interpretation. All authors approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We would like to thank the BBSRC who funded this research under grant BBSB16585.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>The SNP Consortium website: past, present and future</p>
            </title>
            <aug>
               <au>
                  <snm>Thorisson</snm>
                  <fnm>GA</fnm>
               </au>
               <au>
                  <snm>Stein</snm>
                  <fnm>LD</fnm>
               </au>
            </aug>
            <source>Nucl Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>124</fpage>
            <lpage>127</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">165499</pubid>
                  <pubid idtype="pmpid" link="fulltext">12519964</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg052</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>The SNP Consortium website</p>
            </title>
            <url>http://snp.cshl.org</url>
         </bibl>
         <bibl id="B3">
            <title>
               <p>dbSNP: the NCBI database of genetic variation</p>
            </title>
            <aug>
               <au>
                  <snm>Sherry</snm>
                  <fnm>ST</fnm>
               </au>
               <au>
                  <snm>Ward</snm>
                  <fnm>MH</fnm>
               </au>
               <au>
                  <snm>Kholodov</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Baker</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Phan</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Smigielski</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Sirotkin</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <fpage>308</fpage>
            <lpage>311</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">29783</pubid>
                  <pubid idtype="pmpid" link="fulltext">11125122</pubid>
                  <pubid idtype="doi">10.1093/nar/29.1.308</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>NCBI dbSNP</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov/SNP</url>
         </bibl>
         <bibl id="B5">
            <title>
               <p>SNPs, protein structure, and disease</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Moult</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Hum Mutat</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>263</fpage>
            <lpage>270</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/humu.22</pubid>
                  <pubid idtype="pmpid" link="fulltext">11295823</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Human nonsynonymous SNPs: server and survey</p>
            </title>
            <aug>
               <au>
                  <snm>Ramensky</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Sunyaev</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>3894</fpage>
            <lpage>3900</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">137415</pubid>
                  <pubid idtype="pmpid" link="fulltext">12202775</pubid>
                  <pubid idtype="doi">10.1093/nar/gkf493</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Predicting the Functional Consequences of Non-synonymous Single Nucleotide Polymorphisms: Structure-based Assessment of Amino Acid Variation</p>
            </title>
            <aug>
               <au>
                  <snm>Chasman</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Adams</snm>
                  <fnm>RM</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2001</pubdate>
            <volume>307</volume>
            <fpage>683</fpage>
            <lpage>706</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2001.4510</pubid>
                  <pubid idtype="pmpid" link="fulltext">11254390</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Characterization of disease-associated single amino acid polymorphisms in terms of sequence and structure properties</p>
            </title>
            <aug>
               <au>
                  <snm>Ferrer-Costa</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Orozco</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>de la Cruz</snm>
                  <fnm>X</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2002</pubdate>
            <volume>315</volume>
            <fpage>771</fpage>
            <lpage>786</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2001.5255</pubid>
                  <pubid idtype="pmpid" link="fulltext">11812146</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Predicting deleterious amino acid substitutions</p>
            </title>
            <aug>
               <au>
                  <snm>Ng</snm>
                  <fnm>PC</fnm>
               </au>
               <au>
                  <snm>Henikoff</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <fpage>863</fpage>
            <lpage>874</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">311071</pubid>
                  <pubid idtype="pmpid" link="fulltext">11337480</pubid>
                  <pubid idtype="doi">10.1101/gr.176601</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>A comparative study of machine learning methods to predict the effects of single nucleotide polymorphisms on protein function</p>
            </title>
            <aug>
               <au>
                  <snm>Krishnan</snm>
                  <fnm>VG</fnm>
               </au>
               <au>
                  <snm>Westhead</snm>
                  <fnm>DR</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <issue>17</issue>
            <fpage>2199</fpage>
            <lpage>2209</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg297</pubid>
                  <pubid idtype="pmpid" link="fulltext">14630648</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Prediction of deleterious functional effects of amino acid mutations using a library of structure-based function descriptors</p>
            </title>
            <aug>
               <au>
                  <snm>Herrgard</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Cammer</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Hoffman</snm>
                  <fnm>BT</fnm>
               </au>
               <au>
                  <snm>Knutson</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gallina</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Speir</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Fetrow</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Baxter</snm>
                  <fnm>SM</fnm>
               </au>
            </aug>
            <source>Proteins: Structure, Function, and Genetics</source>
            <pubdate>2003</pubdate>
            <volume>53</volume>
            <issue>4</issue>
            <fpage>806</fpage>
            <lpage>816</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1002/prot.10458</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Prediction of deleterious human alleles</p>
            </title>
            <aug>
               <au>
                  <snm>Sunyaev</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ramensky</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Koch</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Lathe</snm>
                  <fnm>W</fnm>
                  <suf>III</suf>
               </au>
               <au>
                  <snm>Kondrashov</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Hum Mol Genet</source>
            <pubdate>2001</pubdate>
            <volume>10</volume>
            <fpage>591</fpage>
            <lpage>597</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/hmg/10.6.591</pubid>
                  <pubid idtype="pmpid" link="fulltext">11230178</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>SIFT: predicting amino acid changes that affect protein function</p>
            </title>
            <aug>
               <au>
                  <snm>Ng</snm>
                  <fnm>PC</fnm>
               </au>
               <au>
                  <snm>Henikoff</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nucl Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <issue>13</issue>
            <fpage>3812</fpage>
            <lpage>3814</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">168916</pubid>
                  <pubid idtype="pmpid" link="fulltext">12824425</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg509</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Evaluation of structural and evolutionary contributions to deleterious mutation prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Saunders</snm>
                  <fnm>CT</fnm>
               </au>
               <au>
                  <snm>Baker</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2002</pubdate>
            <volume>322</volume>
            <fpage>891</fpage>
            <lpage>901</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0022-2836(02)00813-6</pubid>
                  <pubid idtype="pmpid" link="fulltext">12270722</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information</p>
            </title>
            <aug>
               <au>
                  <snm>Bao</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Cui</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>10</issue>
            <fpage>2185</fpage>
            <lpage>2190</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti365</pubid>
                  <pubid idtype="pmpid" link="fulltext">15746281</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Bayesian Approach to Discovering Pathogenic SNPs in Conserved Protein Domains</p>
            </title>
            <aug>
               <au>
                  <snm>Cai</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Tsung</snm>
                  <fnm>EF</fnm>
               </au>
               <au>
                  <snm>Marinescu</snm>
                  <fnm>VD</fnm>
               </au>
               <au>
                  <snm>Ramoni</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Riva</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kohane</snm>
                  <fnm>IS</fnm>
               </au>
            </aug>
            <source>Human Mutation</source>
            <pubdate>2004</pubdate>
            <volume>24</volume>
            <fpage>178</fpage>
            <lpage>184</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/humu.20063</pubid>
                  <pubid idtype="pmpid" link="fulltext">15241800</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>A hierarchical Bayesian model for predicting the functional consequences of amino-acid polymorphisms</p>
            </title>
            <aug>
               <au>
                  <snm>Verzilli</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Whittaker</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Stallard</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Chasman</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Applied Statistics</source>
            <pubdate>2005</pubdate>
            <volume>54</volume>
            <fpage>191</fpage>
            <lpage>206</lpage>
         </bibl>
         <bibl id="B18">
            <title>
               <p>The Bayesian Revolution in Genetics</p>
            </title>
            <aug>
               <au>
                  <snm>Beaumont</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Rannala</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Nature Reviews Genetics</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>251</fpage>
            <lpage>261</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg1318</pubid>
                  <pubid idtype="pmpid" link="fulltext">15131649</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Inferring Cellular Networks Using Probabilistic Graphical Models</p>
            </title>
            <aug>
               <au>
                  <snm>Friedman</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>303</volume>
            <fpage>799</fpage>
            <lpage>805</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1094068</pubid>
                  <pubid idtype="pmpid" link="fulltext">14764868</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <aug>
               <au>
                  <snm>Husmeier</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Dybowski</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <cnm>Eds</cnm>
               </au>
            </aug>
            <source>SR: Probabilistic Modeling in Bioinformatics and Medical Informatics</source>
            <publisher>Springer</publisher>
            <pubdate>2005</pubdate>
         </bibl>
         <bibl id="B21">
            <aug>
               <au>
                  <snm>Jordan</snm>
                  <fnm>MI</fnm>
               </au>
            </aug>
            <source>Learning in Graphical Models</source>
            <publisher>Kluwer Academic</publisher>
            <pubdate>1998</pubdate>
         </bibl>
         <bibl id="B22">
            <aug>
               <au>
                  <snm>Jensen</snm>
                  <fnm>FV</fnm>
               </au>
            </aug>
            <source>Bayesian Networks and Decision Graphs</source>
            <publisher>Springer</publisher>
            <pubdate>2001</pubdate>
         </bibl>
         <bibl id="B23">
            <aug>
               <au>
                  <snm>Pearl</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Causality: models, reasoning and inference</source>
            <publisher>Cambridge</publisher>
            <pubdate>2000</pubdate>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Inference in Bayesian networks</p>
            </title>
            <aug>
               <au>
                  <snm>Needham</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Bradford</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Bulpitt</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Westhead</snm>
                  <fnm>DR</fnm>
               </au>
            </aug>
            <source>Nature Biotechnology</source>
            <pubdate>2006</pubdate>
            <volume>24</volume>
            <fpage>51</fpage>
            <lpage>53</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nbt0106-51</pubid>
                  <pubid idtype="pmpid" link="fulltext">16404397</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>The Bayes Net Toolbox for Matlab</p>
            </title>
            <aug>
               <au>
                  <snm>Murphy</snm>
                  <fnm>KP</fnm>
               </au>
            </aug>
            <source>Computing Science and Statistics</source>
            <pubdate>2001</pubdate>
            <fpage>331</fpage>
            <lpage>350</lpage>
         </bibl>
         <bibl id="B26">
            <title>
               <p>BNT Structure Learning Package: Documentation and Experiments</p>
            </title>
            <aug>
               <au>
                  <snm>Leray</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Francois</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <publisher>Tech. rep., Laboratoire PSI, Universit&#233; et INSA de Rouen</publisher>
            <pubdate>2004</pubdate>
         </bibl>
         <bibl id="B27">
            <title>
               <p>A tutorial on learning with Bayesian networks</p>
            </title>
            <aug>
               <au>
                  <snm>Heckerman</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Learning in Graphical Models</source>
            <publisher>Kluwer Academic</publisher>
            <editor>Jordan MI</editor>
            <pubdate>1998</pubdate>
            <fpage>301</fpage>
            <lpage>354</lpage>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Genetic studies of the lac repressor. XIV. Analysis of 4000 altered <it>Esherichia coli </it>lac repressors reveals essential and non-essential residues, as well as 'spacers' which do not require a specific sequence</p>
            </title>
            <aug>
               <au>
                  <snm>Markiewicz</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kleina</snm>
                  <fnm>LG</fnm>
               </au>
               <au>
                  <snm>Cruz</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ehret</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>JH</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1994</pubdate>
            <volume>240</volume>
            <fpage>421</fpage>
            <lpage>433</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1994.1458</pubid>
                  <pubid idtype="pmpid" link="fulltext">8046748</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Genetic studies of the lac repressor. XV: 4000 single amino acid substitutions and analysis of the resulting phenotypes on the basis of the protein structure</p>
            </title>
            <aug>
               <au>
                  <snm>Suckow</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Markiewicz</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kleina</snm>
                  <fnm>LG</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kisters-Woike</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Muller-Hill</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1996</pubdate>
            <volume>261</volume>
            <fpage>509</fpage>
            <lpage>523</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1996.0479</pubid>
                  <pubid idtype="pmpid" link="fulltext">8794873</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Temperature sensitive mutations of bacteriophage T4 lysozyme occur at sites with low mobility and low slovent accessibility in the folded protein</p>
            </title>
            <aug>
               <au>
                  <snm>Alber</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>DP</fnm>
               </au>
               <au>
                  <snm>Nye</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Muchmore</snm>
                  <fnm>DC</fnm>
               </au>
               <au>
                  <snm>Matthews</snm>
                  <fnm>BW</fnm>
               </au>
            </aug>
            <source>Biochemistry</source>
            <pubdate>1987</pubdate>
            <volume>26</volume>
            <fpage>3754</fpage>
            <lpage>3758</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/bi00387a002</pubid>
                  <pubid idtype="pmpid">3651410</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Systematic mutation of bacteriophage T4 lysozyme</p>
            </title>
            <aug>
               <au>
                  <snm>Rennell</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Bouvier</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Hardy</snm>
                  <fnm>LW</fnm>
               </au>
               <au>
                  <snm>Poteete</snm>
                  <fnm>AR</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1991</pubdate>
            <volume>222</volume>
            <fpage>67</fpage>
            <lpage>88</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0022-2836(91)90738-R</pubid>
                  <pubid idtype="pmpid">1942069</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>ROC Graphs: Notes and Practical Considerations for Data Mining Researchers</p>
            </title>
            <aug>
               <au>
                  <snm>Fawcett</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <publisher>Tech. rep., HP Labs</publisher>
            <pubdate>2003</pubdate>
         </bibl>
      </refgrp>
   </bm>
</art>
