<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-11-98</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Methodology article</dochead>
      <bibl>
         <title>
            <p>Unifying generative and discriminative learning principles</p>
         </title>
         <aug>
            <au ca="yes" id="A1">
               <snm>Keilwagen</snm>
               <fnm>Jens</fnm>
               <insr iid="I1"/>
               <email>Jens.Keilwagen@ipk-gatersleben.de</email>
            </au>
            <au id="A2">
               <snm>Grau</snm>
               <fnm>Jan</fnm>
               <insr iid="I2"/>
               <email>Jan.Grau@informatik.uni-halle.de</email>
            </au>
            <au id="A3">
               <snm>Posch</snm>
               <fnm>Stefan</fnm>
               <insr iid="I2"/>
               <email>Stefan.Posch@informatik.uni-halle.de</email>
            </au>
            <au id="A4">
               <snm>Strickert</snm>
               <fnm>Marc</fnm>
               <insr iid="I1"/>
               <email>Marc.Strickert@ipk-gatersleben.de</email>
            </au>
            <au id="A5">
               <snm>Grosse</snm>
               <fnm>Ivo</fnm>
               <insr iid="I2"/>
               <email>Ivo.Grosse@informatik.uni-halle.de</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany</p>
            </ins>
            <ins id="I2">
               <p>Institute of Computer Science, Martin Luther University Halle-Wittenberg, Germany</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2010</pubdate>
         <volume>11</volume>
         <issue>1</issue>
         <fpage>98</fpage>
         <url>http://www.biomedcentral.com/1471-2105/11/98</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="doi">10.1186/1471-2105-11-98</pubid>
               <pubid idtype="pmpid">20175896</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>5</day>
               <month>11</month>
               <year>2009</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>22</day>
               <month>2</month>
               <year>2010</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>22</day>
               <month>2</month>
               <year>2010</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2010</year>
         <collab>Keilwagen et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>The recognition of functional binding sites in genomic DNA remains one of the fundamental challenges of genome research. During the last decades, a plethora of different and well-adapted models has been developed, but only little attention has been payed to the development of different and similarly well-adapted learning principles. Only recently it was noticed that discriminative learning principles can be superior over generative ones in diverse bioinformatics applications, too.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Here, we propose a generalization of generative and discriminative learning principles containing the maximum likelihood, maximum a posteriori, maximum conditional likelihood, maximum supervised posterior, generative-discriminative trade-off, and penalized generative-discriminative trade-off learning principles as special cases, and we illustrate its efficacy for the recognition of vertebrate transcription factor binding sites.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>We find that the proposed learning principle helps to improve the recognition of transcription factor binding sites, enabling better computational approaches for extracting as much information as possible from valuable wet-lab data. We make all implementations available in the open-source library Jstacs so that this learning principle can be easily applied to other classification problems in the field of genome and epigenome analysis.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Classification of unlabeled data is one of the main tasks in bioinformatics. For DNA sequence analysis, this classification task is synonymous to the computational recognition of short signal sequences in genomic DNA. Examples include the recognition of transcription factor binding sites (TFBSs) <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>, transcription start sites <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>, donor or acceptor splice sites <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>, nucleosome binding sites <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>, miRNA binding sites <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>, or binding sites of insulators like CTCF <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>.</p>
         <p>Many of the employed algorithms use statistical models for representing the distribution of sequences. These models range from simple models like the position weight matrix (PWM) model <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>, the weight array matrix (WAM) model <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B8">8</abbr><abbr bid="B15">15</abbr></abbrgrp>, or Markov models of higher order <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp> to complex models like Bayesian networks <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp> or Markov random fields <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>. A wealth of different models has been proposed for different data sets and different biological questions, and it is advisable to carefully choose an appropriate model for each data set and each biological question separately <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B7">7</abbr><abbr bid="B22">22</abbr></abbrgrp>. However, the performance of a model highly depends on the model parameters learned from training data. In comparison to the effort spent for developing and choosing appropriate models, developing and choosing appropriate learning principles has been neglected, even though this choice is of fundamental importance <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp> and equally non-trivial.</p>
         <p>In the last decades, several learning principles have been proposed for estimating model parameters. The <it>maximum likelihood </it>(ML) learning principle <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp> is one of the first and most popular learning principles used in bioinformatics. An alternative is the <it>maximum a posteriori </it>(MAP) learning principle <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> that applies a prior density to the parameters of the models.</p>
         <p>The ML and the MAP learning principles are commonly referred to as <it>generative</it>. Recently, <it>discriminative </it>learning principles have been shown to be promising in several bioinformatics applications <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B20">20</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B31">31</abbr></abbrgrp>. The discriminative analogue to the ML learning principle is the <it>maximum conditional likelihood </it>(MCL) learning principle <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>, and the <it>maximum supervised posterior </it>(MSP) learning principle <abbrgrp><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr></abbrgrp> has been proposed as discriminative analogue to the MAP learning principle.</p>
         <p>In addition to these four learning principles, hybrid learning principles have been proposed to combine the advantages of generative and discriminative learning principles <abbrgrp><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr></abbrgrp>. Specifically, the <it>generative-discriminative trade-off </it>(GDT) learning principle that interpolates between the ML and the MCL learning principle has been proposed in <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>, and the <it>penalized generative-discriminative trade-off </it>(PGDT) learning principle that interpolates between the MAP and the MSP learning principle has been proposed in <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>.</p>
         <p>Here, we introduce a unified generative-discriminative learning principle containing the ML, the MAP, the MCL, the MSP, the GDT, and the PGDT learning principle as limiting cases. We discuss the interpretation of this learning principle, and we investigate its utility using four data sets of TFBSs.</p>
      </sec>
      <sec>
         <st>
            <p>Results and Discussion</p>
         </st>
         <p>In this section, we present six established learning principles, then introduce the unified generative-discriminative learning principle containing the six established learning principles as special cases, and finally present some discussion and interpretation of the learning principle introduced. We start with considering classifiers that are based on probabilistic models defined by the likelihood <it>P </it>(<ul><it>x</it></ul> | <it>c</it>, <ul><it>&#955;</it></ul>) for sequence <ul><it>x</it></ul> given class label <it>c </it>and parameter vector <ul><it>&#955;</it></ul>. Based on such models the decision criterion <abbrgrp><abbr bid="B43">43</abbr></abbrgrp> of the classifier is defined as</p>
         <p>
            <display-formula id="M1">
               <graphic file="1471-2105-11-98-i1.gif"/>
            </display-formula>
         </p>
         <p>where <it>P </it>(<it>c </it>| <ul><it>x</it></ul>, <ul><it>&#955;</it></ul>) is the conditional likelihood of class label <it>c </it>given sequence <ul><it>x</it></ul> and parameter vector <ul><it>&#955;</it></ul>, <it>P</it>(<it>c</it>, <ul><it>x</it></ul> | <ul><it>&#955;</it></ul>) is the likelihood of sequence <ul><it>x</it></ul> and class label <it>c </it>given parameter vector <ul><it>&#955;</it></ul>, <it>P </it>(<it>c </it>| <ul><it>&#955;</it></ul>) is the probability of class <it>c </it>given parameter vector <ul><it>&#955;</it></ul>, and <it>P</it>(<ul><it>x</it></ul>|<it>c</it>, <ul><it>&#955;</it></ul>) is the conditional probability of sequence <ul><it>x</it></ul> given class label <it>c </it>and parameter vector <ul><it>&#955;</it></ul>.</p>
         <p>The decision and classification performance depend on the parameter vector <ul><it>&#955;</it></ul>. Hence, one needs to infer <it>appropriate </it>parameter vectors <ul><it>&#955;</it></ul> from a data set <ul><it>D</it></ul>: = (<ul><it>x</it></ul><sub>1</sub>,...,<ul><it>x</it></ul><sub><it>N</it></sub>) of <it>N </it>statistically independent and identically distributed (i.i.d.) sequences and the corresponding class labels <ul><it>C</it></ul>:= (<it>c</it><sub>1</sub>,..., <it>c</it><sub><it>N</it></sub>). In the first subsection, we present six learning principles that have been proposed in the machine-learning community and that are nowadays also used in bioinformatics. In the second subsection, we propose a unified learning principle containing all of these six learning principles as special cases. In the third subsection, we provide a mathematical interpretation of this learning principle, and in the fourth subsection we present four case studies illustrating the utility of this learning principle. We present some implementation details in the <b>Methods</b> section.</p>
         <sec>
            <st>
               <p>Established learning principles</p>
            </st>
            <p>Learning principles can be categorized by two criteria. On the one hand, they can be divided by their objective into generative, discriminative, and hybrid learning principles. Generative learning principles aim at an accurate representation of the distribution of the training data in each of the classes, discriminative learning principles aim at an accurate classification of the training data into the classes, and hybrid learning principles are an interpolation between generative and discriminative learning principles. On the other hand, learning principles can be divided by their utilization of prior knowledge into Bayesian and non Bayesian. We call learning principles that incorporate a prior density <it>Q </it>(<ul><it>&#955;</it></ul>|<ul><it>&#945;</it></ul>) on the parameter vector <ul><it>&#955;</it></ul> Bayesian, where <ul><it>&#945;</it></ul> denotes a vector of hyper parameters, while we call learning principles that only use the data - without any prior - to estimate the parameter vector non Bayesian. In Table <tblr tid="T1">1</tblr> we present six established learning principles and their categorization by the above-mentioned criteria, and we describe these learning principles in more detail in the remainder of this subsection.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Learning principles</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2" ca="center">
                        <p>
                           <b>prior knowledge</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <b>non Bayesian</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Bayesian</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <b>generative</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>ML</p>
                     </c>
                     <c ca="center">
                        <p>MAP</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>objective</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>hybrid</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>GDT</p>
                     </c>
                     <c ca="center">
                        <p>PGDT</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <b>discriminative</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>MCL</p>
                     </c>
                     <c ca="center">
                        <p>MSP</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The table shows six established learning principles that can be grouped by their objective as being generative, hybrid, or discriminative and utilization of prior knowledge with the two possibilities non Bayesian and Bayesian. The four elementary learning principles are the generative, non Bayesian maximum likelihood (ML) learning principle, the generative, Bayesian maximum a posteriori (MAP) learning principle, the discriminative, non Bayesian maximum conditional likelihood (MCL) learning principle, and the discriminative, Bayesian maximum supervised posterior (MSP) learning principle. The hybrid learning principles which interpolate between generative and discriminative learning principles are the non Bayesian generative-discriminative trade-off (GDT) learning principle and the penalized generative-discriminative trade-off (PGDT) learning principle.</p>
               </tblfn>
            </tbl>
            <sec>
               <st>
                  <p>Generative learning principles</p>
               </st>
               <p>The maximum likelihood (ML) learning principle is one of the first learning principles used in bioinformatics. Originally, it was proposed by R. A. Fisher at the beginning of the 20<sup><it>th </it></sup>century <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp>. The ML learning principle aims at finding the parameter vector <inline-formula><graphic file="1471-2105-11-98-i2.gif"/></inline-formula> that maximizes the likelihood of the labeled data set (<ul><it>C</it></ul>, <ul><it>D</it></ul>) given the parameter vector <ul><it>&#955;</it></ul>,</p>
               <p>
                  <display-formula id="M2">
                     <graphic file="1471-2105-11-98-i3.gif"/>
                  </display-formula>
               </p>
               <p>However, for many applications, the amount of sequence data available for training is very limited. For this reason, the ML learning principle often leads to suboptimal classification performance e.g. due to zero-occurrences of some nucleotides or oligonucleotides in the training data sets.</p>
               <p>The maximum a posteriori (MAP) learning principle, which applies a prior <it>Q </it>(<ul><it>&#955;</it></ul>|<ul><it>&#945;</it></ul>) to the parameter vector, establishes a theoretical foundation to alleviate this problem and at the same time allows the inclusion of prior knowledge aside from the training data <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. The MAP learning principle aims at finding the parameter vector <inline-formula><graphic file="1471-2105-11-98-i4.gif"/></inline-formula> that maximizes the posterior density,</p>
               <p>
                  <display-formula id="M3">
                     <graphic file="1471-2105-11-98-i5.gif"/>
                  </display-formula>
               </p>
               <p>If for a given family of likelihood functions <it>P</it>(<ul><it>C</it></ul>, <ul><it>D</it></ul>|<ul><it>&#955;</it></ul>) the posterior <it>P</it>(<ul><it>&#955;</it></ul>|<ul><it>C</it></ul>, <ul><it>D</it></ul>, <ul><it>&#945;</it></ul>) is in the same family of distributions as the prior <it>Q </it>(<ul><it>&#955;</it></ul>|<ul><it>&#945;</it></ul>), i.e., if</p>
               <p><display-formula id="M4"><graphic file="1471-2105-11-98-i6.gif"/></display-formula>,</p>
               <p>the prior is said to be <it>conjugate </it>to this family of likelihood functions, for hyper parameter vector <inline-formula><graphic file="1471-2105-11-98-i7.gif"/></inline-formula> incorporates both prior knowledge and training data. Conjugate priors often allow an interpretation of the hyper parameter vector as stemming from an a priorily observed set of "<it>pseudo data." </it>In addition, it allows finding the optimal parameter vector <inline-formula><graphic file="1471-2105-11-98-i4.gif"/></inline-formula> analytically provided one can determine the maximum of the prior analytically.</p>
            </sec>
            <sec>
               <st>
                  <p>Discriminative learning principles</p>
               </st>
               <p>Discriminative learning principles have been shown to be promising in the field of bioinformatics <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B20">20</abbr><abbr bid="B26">26</abbr><abbr bid="B31">31</abbr></abbrgrp>. The discriminative analogue to the ML learning principle is the maximum conditional likelihood (MCL) learning principle <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp> that aims at finding the parameter vector <inline-formula><graphic file="1471-2105-11-98-i8.gif"/></inline-formula> that maximizes the conditional likelihood of the labels <ul><it>C</it></ul> given the data <ul><it>D</it></ul> and parameter vector <ul><it>&#955;</it></ul>,</p>
               <p>
                  <display-formula id="M5">
                     <graphic file="1471-2105-11-98-i9.gif"/>
                  </display-formula>
               </p>
               <p>The effects of limited data may be even more severe when using the MCL learning principle compared to generative learning principles <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. To overcome this problem, the maximum supervised posterior (MSP) learning principle <abbrgrp><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr></abbrgrp> has been proposed as discriminative analogue to the MAP learning principle. In analogy to equation (3), the MSP learning principle aims at finding the parameter vector <inline-formula><graphic file="1471-2105-11-98-i10.gif"/></inline-formula> that maximizes the product of the conditional likelihood and the prior density,</p>
               <p>
                  <display-formula id="M6">
                     <graphic file="1471-2105-11-98-i11.gif"/>
                  </display-formula>
               </p>
            </sec>
            <sec>
               <st>
                  <p>Generative-discriminative trade-offs</p>
               </st>
               <p>Different hybrid learning principles have been proposed in the machine learning community <abbrgrp><abbr bid="B37">37</abbr><abbr bid="B39">39</abbr><abbr bid="B41">41</abbr></abbrgrp>. Hybrid learning principles aim at combining the strengths of generative and discriminative learning principles. Here, we follow the ideas of Bouchard and co-workers who propose an interpolation between the generative ML learning principle and the discriminative MCL learning principle <abbrgrp><abbr bid="B37">37</abbr></abbrgrp> as well as the generative MAP learning principle and the discriminative MSP learning principle <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>. The generative-discriminative trade-off (GDT) learning principle proposed in <abbrgrp><abbr bid="B37">37</abbr></abbrgrp> aims at finding the parameter vector <ul><it>&#955;</it></ul> that maximizes the weighted product of the conditional likelihood and likelihood, i.e.,</p>
               <p><display-formula id="M7"><graphic file="1471-2105-11-98-i12.gif"/></display-formula>,</p>
               <p>for given weight <it>&#947; </it>&#8712; [0, 1]. As special cases of the PGDT learning principle, we obtain the ML learning principle for <it>&#947; </it>= 1 and the MCL learning principle for <it>&#947; </it>= 0. By varying <it>&#947; </it>between 0 and 1, different beneficial   trade-offs can be obtained for classification.</p>
               <p>In close analogy to the MAP and the MSP learning principle, which are obtained by multiplying a prior to the likelihood and conditional likelihood, respectively, the penalized generative-discriminative trade-off (PGDT) learning principle aims at finding the parameter vector <ul><it>&#955;</it></ul> that maximizes the objective function</p>
               <p>
                  <display-formula id="M8">
                     <graphic file="1471-2105-11-98-i13.gif"/>
                  </display-formula>
               </p>
               <p>for given weight <it>&#947; </it>&#8712; [0, 1]. As special cases of the PGDT learning principle, we obtain the MAP learning principle for <it>&#947; </it>= 1 and the MSP learning principle for <it>&#947; </it>= 0.</p>
               <p>We summarize the six established learning principles in Table <tblr tid="T1">1</tblr>.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Unified generative-discriminative learning principle</p>
            </st>
            <p>Comparing equations (2), (3), (5), (6), (7), and (8), we find that the following three terms are sufficient for defining these six learning principles:</p>
            <p indent="1">1. the conditional likelihood <it>P </it>(<ul><it>C</it></ul>|<ul><it>D</it></ul>, <ul><it>&#955;</it></ul>),</p>
            <p indent="1">2. the likelihood <it>P </it>(<ul><it>C</it></ul>, <ul><it>D</it></ul>|<ul><it>&#955;</it></ul>), and</p>
            <p indent="1">3. the prior <it>Q </it>(<ul><it>&#955;</it></ul>|<ul><it>&#945;</it></ul>).</p>
            <p>With the goal of unifying and generalizing all six learning principles, we propose a unified generative-discriminative learning principle that aims at finding the parameter vector <ul><it>&#955;</it></ul> that maximizes the weighted product of the conditional likelihood, likelihood, and prior, i.e.,</p>
            <p><display-formula id="M9"><graphic file="1471-2105-11-98-i14.gif"/></display-formula>,</p>
            <p>with the weighting factors <ul><it>&#946;</it></ul>:= (<it>&#946;</it><sub>0</sub>, <it>&#946;</it><sub>1</sub>, <it>&#946;</it><sub>2</sub>), <it>&#946;</it><sub>0</sub>, <it>&#946;</it><sub>1</sub>, <it>&#946;</it><sub>2 </sub>&#8712; <inline-formula><graphic file="1471-2105-11-98-i15.gif"/></inline-formula>, and <it>&#946;</it><sub>0</sub>, + <it>&#946;</it><sub>1</sub> + <it>&#946;</it><sub>2 </sub>= 1.</p>
            <p>The six established learning principles can be obtained as limiting cases of equation (9) as follows</p>
            <p indent="1">&#8226; ML if <ul><it>&#946;</it></ul> = (0, 1, 0),</p>
            <p indent="1">&#8226; MAP if <ul><it>&#946;</it></ul> = (0, 0.5, 0.5),</p>
            <p indent="1">&#8226; MCL if <ul><it>&#946;</it></ul> = (1, 0, 0),</p>
            <p indent="1">&#8226; MSP if <ul><it>&#946;</it></ul> = (0.5, 0, 0.5),</p>
            <p indent="1">&#8226; GDT if <it>&#946;</it><sub>2 </sub>= 0, and</p>
            <p indent="1">&#8226; PGDT if <it>&#946;</it><sub>2 </sub>= 0.5.</p>
            <p>In Figure <figr fid="F1">1(a)</figr>, we illustrate the simplex <ul><it>&#946;</it></ul> by a projection onto the (<it>&#946;</it><sub>0</sub>, <it>&#946;</it><sub>1</sub>)-plane showing the established learning principles as well as the unified generative-discriminative learning principle. However, there are several other hybrid learning principles that are not covered by this unification.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Illustration of the unified generative-discriminative learning principle</p>
               </caption>
               <text>
                  <p><b>Illustration of the unified generative-discriminative learning principle</b>. The plots show a projection of the simplex <ul><it>&#946;</it></ul> onto the (<it>&#946;</it><sub>0</sub>, <it>&#946;</it><sub>1</sub>)-plane and the corresponding learning principles for the specific weights encoded by colors. Figure 1(a) shows the general interpretation of the simplex where the points (0, 1), (0, 0.5), (1, 0), and (0.5, 0) refer to the ML, MAP, MCL, and MSP learning principle, respectively, while the lines <it>&#946;</it><sub>1 </sub>= 1 - <it>&#946;</it><sub>0 </sub>and <it>&#946;</it><sub>1 </sub>= 0.5 - <it>&#946;</it><sub>0 </sub>refer to the GDT and PGDT learning principle, respectively. Figure 1(b) shows the interpretation of the unified generative-discriminative learning principle for a conjugate prior that satisfies the condition of equation (11). In this case, each point on the abscissa (<it>&#946;</it><sub>0</sub>-axis) and ordinate (<it>&#946;</it><sub>1</sub>-axis) refers to the MSP and MAP learning principle, respectively, using the prior in a weighted version <it/><inline-formula><graphic file="1471-2105-11-98-i16.gif"/></inline-formula>. The simplex colored in gray corresponds to the MSP learning principle using the weighted posterior <it/><inline-formula><graphic file="1471-2105-11-98-i17.gif"/></inline-formula> as prior for the parameter vector <ul><it>&#955;</it></ul>.</p>
               </text>
               <graphic file="1471-2105-11-98-1" hint_layout="double"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Interpretation of the unified generative-discriminative learning principle</p>
            </st>
            <p>In this subsection, we investigate the simplex <ul><it>&#946;</it></ul> and its relation to six established learning principles. First, we consider the axes of the simplex <ul><it>&#946;</it></ul>. We can write the learning principle that corresponds to the <it>&#946;</it><sub>0</sub>-axis (<it>&#946;</it><sub>0 </sub>> 0 and <it>&#946;</it><sub>1 </sub>= 0) using the constraint <it>&#946;</it><sub>0 </sub>= 1 - <it>&#946;</it><sub>2 </sub>for this axis as</p>
            <p>
               <display-formula id="M10a">
                  <graphic file="1471-2105-11-98-i18.gif"/>
               </display-formula>
            </p>
            <p>Similarly, we can write the learning principle that corresponds to the <it>&#946;</it><sub>1</sub>-axis (with <it>&#946;</it><sub>0 </sub>= 0 and <it>&#946;</it><sub>1 </sub>> 0) as</p>
            <p>
               <display-formula id="M10b">
                  <graphic file="1471-2105-11-98-i19.gif"/>
               </display-formula>
            </p>
            <p>These equations state that each point on the abscissa (<it>&#946;</it><sub>0</sub>-axis) and on the ordinate (<it>&#946;</it><sub>1</sub>-axis) corresponds to the MSP and the MAP learning principle, respectively, with a weighted prior.</p>
            <p>If the prior fulfills the condition</p>
            <p>
               <display-formula id="M11">
                  <graphic file="1471-2105-11-98-i20.gif"/>
               </display-formula>
            </p>
            <p>for any <it>&#958; </it>&#8712; &#8477;<sup>+</sup>, each point (1 - <it>&#946;</it><sub>2</sub>, 0) and (0, 1 - <it>&#946;</it><sub>2</sub>) on the axes corresponds to either the MSP or the MAP learning principle using the prior <it/><inline-formula><graphic file="1471-2105-11-98-i16.gif"/></inline-formula>, respectively. The <it>Generalized Dirichlet prior </it>for Markov random fields <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>, which has been proposed to allow a direct comparison of the MAP and the MSP learning principle, fulfills the condition of equation (11) (Appendix A in Additional File <supplr sid="S1">1</supplr>).</p>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p><b>Appendix</b>. This file contains additional information about Markov random fields and the case studies.</p>
               </text>
               <file name="1471-2105-11-98-S1.PDF">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>Second, we consider the lines <it>&#946;</it><sub>1 </sub>= <it>&#957; </it>- <it>&#946;</it><sub>0 </sub>with <it>&#957; </it>&#8712; [0, 1]. As visualized in Figure <figr fid="F1">1(a)</figr>, the unified generative-discriminative learning principle results in the GDT and the PGDT learning principle for <it>&#957; </it>= 1 and <it>&#957; </it>= 0.5, respectively. Using <it>&#946;</it><sub>2 </sub>&#8712; (0, 1) and the condition of equation (11) with <inline-formula><graphic file="1471-2105-11-98-i21.gif"/></inline-formula>, we find that equation (9) can be written as</p>
            <p>
               <display-formula id="M12">
                  <graphic file="1471-2105-11-98-i22.gif"/>
               </display-formula>
            </p>
            <p>This equation is equivalent to equation (8), stating that - for each <it>&#946;</it><sub>2 </sub>- each point on the line <it>&#946;</it><sub>1 </sub>= (1 - <it>&#946;</it><sub>2</sub>) - <it>&#946;</it><sub>0 </sub>corresponds to a specific instance of the PGDT learning principle with prior <it/><inline-formula><graphic file="1471-2105-11-98-i23.gif"/></inline-formula>. Using this result, the unified generative-discriminative learning principle allows an in-depth analysis of the PGDT learning principle using different priors.</p>
            <p>Finally, we consider a second interpretation of the unified generative-discriminative learning principle. The last two terms of the equation (9) consisting of the weighted likelihood and the weighted prior might be interpreted as a weighted posterior. Using the assumption of conjugacy (equation (4)), the condition of equation (11), and <it>&#946;</it><sub>0</sub>, <it>&#946;</it><sub>1</sub>, <it>&#946;</it><sub>2 </sub>&#8712; &#8477;<sup>+</sup>, we obtain</p>
            <p>
               <display-formula id="M13a">
                  <graphic file="1471-2105-11-98-i24.gif"/>
               </display-formula>
            </p>
            <p>
               <display-formula id="M13b">
                  <graphic file="1471-2105-11-98-i25.gif"/>
               </display-formula>
            </p>
            <p>stating that each point on the simplex can be interpreted as MSP learning principle with an informative prior <it/><inline-formula><graphic file="1471-2105-11-98-i17.gif"/></inline-formula> composed of the likelihood and the original prior. Interestingly, the interpretation of each point of the simplex as instance of the MSP learning principle using the weighted posterior as prior remains valid even for priors that do not fulfill the conditions. Figure <figr fid="F1">1(b)</figr> visualizes these results.</p>
         </sec>
         <sec>
            <st>
               <p>Testing</p>
            </st>
            <p>In this subsection, we present four case studies illustrating the utility of the unified generative-discriminative learning principle. In specific practical applications, the choice of appropriate training and test data sets is a highly non-trivial task. Since the final results strongly depend on the chosen data sets, we recommend this choice to be made with great care and in a problem-specific manner. This choice is typically influenced by a-priori knowledge on both the expected binding sites (BSs) and the targeted genome regions. Examples of features that are often considered when choosing appropriate data sets are the <monospace>GC</monospace> content of the target region, their association with <monospace>CpG</monospace> islands, or their size and proximity to transcription start sites.</p>
            <p>Carefully choosing appropriate training and test data sets is of additional advantage if the set of targeted genome regions is not homogeneous, e.g., comprising both <monospace>GC</monospace>-rich and <monospace>GC</monospace>-poor regions, <monospace>CpG</monospace> islands and <monospace>CpG</monospace> deserts, <monospace>TATA</monospace>-containing and <monospace>TATA</monospace>-less promoters, upstream regions with and without BSs of another TF, etc. In this case, one often finds that different learning principles work well for different subgroups, even if the same combination of models is chosen, providing the possibility of choosing subgroup-specific learning principles by choosing different values of <ul><it>&#946;</it></ul>.</p>
            <p>These considerations are vital for a successful prediction of TFBSs, but beyond the scope of this paper, so we choose some traditional data sets in the following case study. Specifically, we choose the following four data sets of experimentally verified TFBSs of length <it>L </it>= 16 bp from TRANSFAC <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. The data set AR/GR/PR contains 104 BSs from three specific steroid hormone receptors from the same class of TFs. The data sets GATA and Thyroid contain 110 and 127 BSs, respectively, of TFs with zinc-coordinating DNA-binding domains. Finally, the data set NF-<it>&#954;</it>B contains 72 BSs of the rapid-acting family of primary TFs NF<it>&#954;</it>B. As background data set we choose the standard background data set of TRANSFAC consisting of 267 second exons of human genes with 68,141 bp in total, which we chunk into sequences of length of at most 100 bp. We build classifiers with the goal of classifying, for each family of TFs separately, a given 16-mer as BS or as subsequence of a background sequence.</p>
            <p>We choose a na&#239;ve Bayes classifier consisting of two PWM models and the Generalized Dirichlet prior <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> using an <it>equivalent sample size </it>(ESS) (Appendix A in Additional File <supplr sid="S1">1</supplr>) of 4 and 1024 for the foreground and the background class, respectively. We choose the sensitivity for a specificity of 99.9% <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> as performance measure. We present the results for three additional performance measures in Appendix B of Additional File <supplr sid="S1">1</supplr>. We perform a 1,000-fold stratified hold-out sampling with 90% of the data for training and 10% of the data for assessing the performance measures for the evaluation of the unified generative-discriminative learning principle.</p>
            <p>In Figure <figr fid="F2">2a</figr>, we illustrate the results for BSs of the TFs AR/GR/PR. Considering the ML learning principle located at (<it>&#946;</it><sub>0</sub>, <it>&#946;</it><sub>1</sub>) = (0, 1) and the MCL learning principle located at (<it>&#946;</it><sub>0</sub>, <it>&#946;</it><sub>1</sub>) = (1, 0), we find a sensitivity of 54.7% and 55.2%, respectively. Interestingly, the MCL learning principle achieves a higher sensitivity for a given specificity of 99.9% than the ML learning principle for this small data set. Using the Generalized Dirichlet prior with hyper parameters corresponding to uniform pseudo data, the sensitivities can be increased. Considering the MAP learning principle located at (<it>&#946;</it><sub>0</sub>, <it>&#946;</it><sub>1</sub>) = (0, 0.5) and the MSP learning principle located at (<it>&#946;</it><sub>0</sub>, <it>&#946;</it><sub>1</sub>) = (0.5, 0), we obtain a sensitivity of 54.9% and 55.6%, respectively. This shows that the MSP learning principle yields an increase of sensitivity of 0.7% compared to the MAP learning principle, consistent with the general observation that discriminatively learned classifiers often outperform their generatively learned counterparts. This increase of sensitivity is achieved using the same prior and the same hyper parameters for both learning principles, but it is possible that the particular choice of the hyper parameters may favour one of the learning principles.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Performance of the unified generative-discriminative learning principle for four data sets</p>
               </caption>
               <text>
                  <p><b>Performance of the unified generative-discriminative learning principle for four data sets</b>. We perform a 1,000-fold stratified hold-out sampling procedure for the four data sets, record for different values of <ul><it>&#946;</it></ul> the mean sensitivity for a fixed specificity of 99.9%, and plot the mean sensitivities on the simplex <ul><it>&#946;</it></ul> in analogy to Figure 1. Yellow indicates the highest sensitivity, red indicates the lowest sensitivity, and the gray contour lines of each subfigure indicate multiples of the standard error of the maximum sensitivity.</p>
               </text>
               <graphic file="1471-2105-11-98-2" hint_layout="double"/>
            </fig>
            <p>Following equations (10a) and (10b), each point on the <it>&#946;</it><sub>0</sub>- and <it>&#946;</it><sub>1</sub>-axis corresponds to the MSP and the MAP learning principle, respectively, with specific hyper parameters <ul><it>&#945;</it></ul>. The location on the axis indicates the strength of the prior reflected by the <it>virtual </it>ESS (Appendix A in Additional File <supplr sid="S1">1</supplr>). Next, we investigate for both learning principles the influence of the strength of the prior on the sensitivity.</p>
            <p>For the MAP learning principle, the sensitivity ranges from 54.7% for <ul><it>&#946;</it></ul> = (0, 0.05, 0.95) to 54.8% for <ul><it>&#946;</it></ul> = (0, 0.95, 0.05), achieving a maximum of 55.1% for <ul><it>&#946;</it></ul> = (0, 0.1, 0.9). For the MSP learning principle, the sensitivity ranges from the maximum value 56.7% for <ul><it>&#946;</it></ul> = (0.05, 0, 0.95) to 55.3% for <ul><it>&#946;</it></ul> = (0.95, 0, 0.05). Comparing the maximum sensitivities for both learning principles and different virtual ESSs, we find that the MSP learning principle with a maximum sensitivity of 56.7% clearly outperforms the MAP learning principle by 1.6%, whereas the difference of sensitivities is only 0.7% for the original ESS.</p>
            <p>Investigating this increase in the difference of sensitivities between the results for the MAP and the MSP learning principle, we find that the sensitivity increases for decreasing <it>&#946;</it><sub>0 </sub>on the <it>&#946;</it><sub>0</sub>-axis, which corresponds to the MSP learning principle with an increasing virtual ESS of the prior. In contrast to this observation, the sensitivity for the MAP learning principle increases less strongly with an increasing virtual ESS. This finding gives a first hint that a prior with a large ESS might be beneficial for the MSP learning principle, while we cannot observe a similar effect for the MAP learning principle in this case.</p>
            <p>Next, we consider the lines <it>&#946;</it><sub>1 </sub>= <it>&#957; </it>- <it>&#946;</it><sub>0</sub>, which correspond to the hybrid learning principles GDT and PGDT for <it>&#957; </it>= 1 and <it>&#957; </it>= 0.5, respectively. For the GDT learning principle, the sensitivity ranges from 54.7% for <ul><it>&#946;</it></ul> = (0, 1, 0) to 55.2% for <ul><it>&#946;</it></ul> = (1, 0, 0), reaching a maximum of 56.9% for <ul><it>&#946;</it></ul> = (0.55, 0.45, 0). For the PGDT learning principle, the sensitivity ranges from 54.9% for <ul><it>&#946;</it></ul> = (0, 0.5, 0.5) to 55.6% for <ul><it>&#946;</it></ul> = (0.5, 0, 0.5), reaching a maximum of 57.1% for <ul><it>&#946;</it></ul> = (0.3, 0.2, 0.5). For both learning principles, we find that the sensitivity is initially increasing and finally decreasing. This observation indicates that neither the MAP nor the MSP learning principle with a Generalized Dirichlet prior representing uniform pseudo data is optimal for estimating the parameter vector <ul><it>&#955;</it></ul>.</p>
            <p>Next, we investigate the interior of the simplex. We vary both <it>&#946;</it><sub>0 </sub>and <it>&#946;</it><sub>1 </sub>along a grid with step width 0.05, and we find the highest sensitivity of 57.3% for <ul><it>&#946;</it></ul> = (0.1, 0.1, 0.8). We find the region of highest sensitivity clearly inside the simplex near the angle bisector. This region corresponds to the MSP learning principle with an informative prior based on weighted likelihood and weighted original prior. Comparing the highest sensitivity for the GDT, the PGDT, and the unified generative-discriminative learning principle, we find that it increases from 56.9% over 57.1% to 57.3%, confirming that the prior can have a positive influence on the performance.</p>
            <p>Turning to the results of the other three TFs GATA, NF-<it>&#954;</it>B, and Thyroid, we find qualitatively similar results. The highest sensitivities are located inside the simplex, while the lowest sensitivities are located on the axes. For BSs of the TF GATA, we obtain a sensitivity of 77.5% for <ul><it>&#946;</it></ul> = (0.45, 0.25, 0.3), for the BSs of the TF NF-<it>&#954;</it>B, we obtain a sensitivity of 81.8% for <ul><it>&#946;</it></ul> = (0.4, 0.55, 0.05), and for the BSs of the TF Thyroid, we obtain 52.3% for <ul><it>&#946;</it></ul> = (0.4, 0.55, 0.05). Similar to the data set of AR/GR/PR, we find a small region with high sensitivity for the BSs of the TFs NF-<it>&#954;</it>B and Thyroid, while we find a broad region with high sensitivity for the BSs of the TF GATA.</p>
            <p>We summarize the sensitivities for the ML, the MCL, the MAP, the MSP, and the unified generative-discriminative learning principle in Table <tblr tid="T2">2</tblr>. We find that for all four TFs the unified generative-discriminative learning principle yields the highest sensitivities. Regarding the <it>&#946;</it><sub>1</sub>-axis, which corresponds to the MAP learning principle using the Generalized Dirichlet prior representing uniform pseudo data with different ESSs, we find that increasing the prior weight <it>&#946;</it><sub>2</sub>, which is equivalent to decreasing the generative weight <it>&#946;</it><sub>1</sub>, often reduces the sensitivity. We obtain the lowest sensitivity for the MAP learning principle for the largest prior weights <it>&#946;</it><sub>2 </sub>in almost all cases. In contrast to this observation, we find on the <it>&#946;</it><sub>0</sub>-axis, which correspond to the MSP learning principle with the Generalized Dirichlet prior representing uniform pseudo data with different ESSs, that increasing the prior weight <it>&#946;</it><sub>2 </sub>improves the sensitivity at least initially.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Results for four data sets</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <b>AR/GR/PR</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>GATA</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>NF-<it>&#954;</it>B</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Thyroid</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>ML</p>
                     </c>
                     <c ca="center">
                        <p>54.7</p>
                     </c>
                     <c ca="center">
                        <p>77.0</p>
                     </c>
                     <c ca="center">
                        <p>81.6</p>
                     </c>
                     <c ca="center">
                        <p>51.3</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>MCL</p>
                     </c>
                     <c ca="center">
                        <p>55.2</p>
                     </c>
                     <c ca="center">
                        <p>73.2</p>
                     </c>
                     <c ca="center">
                        <p>76.5</p>
                     </c>
                     <c ca="center">
                        <p>50.0</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>MAP</p>
                     </c>
                     <c ca="center">
                        <p>55.1</p>
                     </c>
                     <c ca="center">
                        <p>77.0</p>
                     </c>
                     <c ca="center">
                        <p>81.6</p>
                     </c>
                     <c ca="center">
                        <p>51.3</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>MSP</p>
                     </c>
                     <c ca="center">
                        <p>56.9</p>
                     </c>
                     <c ca="center">
                        <p>77.0</p>
                     </c>
                     <c ca="center">
                        <p>79.6</p>
                     </c>
                     <c ca="center">
                        <p>50.4</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Unified</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>57.3</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>77.5</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>81.8</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>52.3</b>
                        </p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Summary the results of Figure 2 for the 4 data sets containing the highest sensitivity for the ML, the MCL, the MAP, the MSP, and the unified generative-discriminative learning principle. For the MAP, the MSP, and the unified generative-discriminative learning principle, we present the best results form the simplex <ul><it>&#946;</it></ul> which correspond to one of these learning principles (see Figure 1b). For each data set, the highest sensitivity is displayed in bold.</p>
               </tblfn>
            </tbl>
            <p>Interestingly, we obtain qualitatively similar results when using other performance measures (Appendix B in Additional File <supplr sid="S1">1</supplr>). These observations suggest that the same classifier trained either by generative or by discriminative learning principles may prefer different ESSs even if one uses a prior that corresponds to uniform pseudo data. Hence, the strength of the prior has a decisive influence on comparisons of the results from generative and discriminative learning principles as well as the results of Bayesian hybrid learning principles as for instance PGDT learning principle. Most importantly, we find that the unified generative-discriminative learning principle leads to an improvement for almost all of the studied data sets and performance measures.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>A plethora of algorithms for the recognition of short DNA sequence motifs has been proposed in the last decades. These algorithms differ by their underlying statistical models and the employed learning principles. In bioinformatics, generative learning principles have a long tradition, but recently it was shown that discriminative learning principles can lead to an improvement of the recognition of short signal sequences.</p>
         <p>We introduce a unified generative-discriminative learning principle that contains the ML, the MAP, the MCL, the MSP, the GDT, and the PGDT learning principle as limiting cases. This learning principle interpolates between the likelihood, the conditional likelihood, and the prior, spanning a three-dimensional simplex, which allows a more detailed comparison of different learning principles. Furthermore, we find that under mild assumptions each point on the simplex can be interpreted as MSP learning principle using an informative prior composed of a weighted likelihood and a weighted original prior.</p>
         <p>We find that the unified generative-discriminative learning principle improves the performance of classifiers for the recognition of vertebrate TFBSs over any of the six established learning principles it contains as special case. We make all implementations available for the scientific community as part of the open-source Java library Jstacs <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>, which allows using this learning principle easily for other bioinformatics problems. Although we demonstrate the utility of the unified generative-discriminative learning principle only for four data sets of TFBSs and four performance measures, it is conceivable that it can be successfully applied to other multinomial data such as data of transcription start sites, donor and acceptor splice sites, splicing enhancers and silencers, as well as binding sites of insulators, nucleosomes, and miRNAs, as well as continuous data.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <p>Considering the task of determining the optimal parameter vector <inline-formula><graphic file="1471-2105-11-98-i26.gif"/></inline-formula>, we find that generative learning principles often allow to estimate <inline-formula><graphic file="1471-2105-11-98-i26.gif"/></inline-formula> analytically for simple models such as Markov models, but one must use numerical optimization procedures for discriminative and hybrid learning principles, and consequently for the unified generative-discriminative learning principle as well. If the conditional likelihood, the likelihood, and the prior are log-convex functions, we can use any numerical algorithm to determine the globally optimal parameter vector <inline-formula><graphic file="1471-2105-11-98-i26.gif"/></inline-formula> for the unified generative-discriminative learning principle.</p>
         <p>Different numerical methods including steepest descent, conjugate gradient, quasi-Newton methods, and limited-memory quasi-Newton methods have been evaluated in <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. In the case studies presented in the previous subsection, we use a limited-memory quasi-Newton method. In analogy to <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>, we fix <ul><it>&#946;</it></ul> for the unified generative-discriminative learning principle, and we compute the results for a grid of given values of <ul><it>&#946;</it></ul>, providing an overall impression of the performance for the whole simplex <ul><it>&#946;</it></ul>.</p>
         <p>The unified generative-learning principle can in principle be used for all types of data, and it is not limited to multinomial data presented in section <it>Testing</it>. We make all implementations available for the scientific community as part of the open-source Java library Jstacs <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>. Jstacs comprises an efficient representation of sequence data and provides object-oriented implementations of many statistical models. We implement the unified generative-discriminative learning principle as a multi-threaded class based on the Jstacs class hierarchy <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>. This allows applying the learning principle efficiently on multi-core computers and to other statistical models. For optimizing parameters, we use optimization procedures provided by Jstacs.</p>
      </sec>
      <sec>
         <st>
            <p>List of abbreviations used</p>
         </st>
         <p>BS: binding site; ESS: equivalent sample size; GDT: generative-discriminative trade-off; MAP: maximum a posteriori; MCL: maximum conditional likelihood; ML:  maximum likelihood; MSP: maximum supervised posterior; PGDT: penalized generative-discriminative trade-off; PWM: position weight matrix; TF: transcription factor; TFBS: transcription factor binding sites; WAM: weight array matrix.</p>
      </sec>
      <sec>
         <st>
            <p>Availability and Requirements</p>
         </st>
         <p>Project name: GenDisMix</p>
         <p>Project home page: <abbrgrp><abbr bid="B47">47</abbr></abbrgrp></p>
         <p>Operating system(s): Platform independent</p>
         <p>Programming language: Java 1.5</p>
         <p>Requirements: Jstacs 1.3</p>
         <p>License: GNU General Public License version 3</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>JK and IG developed the basic ideas. JK and JG implemented the software. JK performed the case studies. All authors contributed to data analysis, writing, and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank Alexander Zien for helpful discussions and four anonymous reviewers for their valuable comments. This work was supported by grant XP3624HP/0606T by the Ministry of Culture of Saxony-Anhalt.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>MATCH: A tool for searching transcription factor binding sites in DNA sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Kel</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>G&#246;ssling</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Reuter</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Cheremushkin</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Kel-Margoulis</snm>
                  <fnm>OV</fnm>
               </au>
               <au>
                  <snm>Wingender</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <issue>13</issue>
            <fpage>3576</fpage>
            <lpage>3579</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkg585</pubid>
                  <pubid idtype="pmcid">169193</pubid>
                  <pubid idtype="pmpid" link="fulltext">12824369</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Modeling Dependencies in Protein-DNA Binding Sites</p>
            </title>
            <aug>
               <au>
                  <snm>Barash</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Elidan</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Friedman</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Kaplan</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>In proceedings of Seventh Annual International Conference on Computational Molecular Biology</source>
            <pubdate>2003</pubdate>
            <fpage>28</fpage>
            <lpage>37</lpage>
         </bibl>
         <bibl id="B3">
            <title>
               <p>ARTS: accurate recognition of transcription starts in human</p>
            </title>
            <aug>
               <au>
                  <snm>Sonnenburg</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Zien</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>R&#228;tsch</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <issue>14</issue>
            <fpage>e472</fpage>
            <lpage>e480</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btl250</pubid>
                  <pubid idtype="pmpid" link="fulltext">16873509</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Toward a gold standard for promoter prediction evaluation</p>
            </title>
            <aug>
               <au>
                  <snm>Abeel</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Peer</snm>
                  <mnm>Van de</mnm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Saeys</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2009</pubdate>
            <volume>25</volume>
            <issue>12</issue>
            <fpage>i313</fpage>
            <lpage>i320</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btp191</pubid>
                  <pubid idtype="pmcid">2687945</pubid>
                  <pubid idtype="pmpid" link="fulltext">19478005</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Prediction of complete gene structures in human genomic DNA</p>
            </title>
            <aug>
               <au>
                  <snm>Burge</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Karlin</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1997</pubdate>
            <volume>268</volume>
            <fpage>78</fpage>
            <lpage>94</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1997.0951</pubid>
                  <pubid idtype="pmpid" link="fulltext">9149143</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>A method for identifying splice sites and translational start sites in eukaryotic mRNA</p>
            </title>
            <aug>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Comput Appl Biosci</source>
            <pubdate>1997</pubdate>
            <volume>13</volume>
            <issue>4</issue>
            <fpage>365</fpage>
            <lpage>376</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9283751</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Maximum Entropy Modeling of Short Sequence Motifs with Applications to RNA Splicing Signals</p>
            </title>
            <aug>
               <au>
                  <snm>Yeo</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Burge</snm>
                  <fnm>CB</fnm>
               </au>
            </aug>
            <source>Journal of Computational Biology</source>
            <pubdate>2004</pubdate>
            <volume>11</volume>
            <issue>2-3</issue>
            <fpage>377</fpage>
            <lpage>394</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/1066527041410418</pubid>
                  <pubid idtype="pmpid" link="fulltext">15285897</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>A genomic code for nucleosome positioning</p>
            </title>
            <aug>
               <au>
                  <snm>Segal</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Fondufe-Mittendorf</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Th&#229;str&#246;m</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Field</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Moore</snm>
                  <fnm>IK</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>JPZ</fnm>
               </au>
               <au>
                  <snm>Widom</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2006</pubdate>
            <volume>442</volume>
            <issue>7104</issue>
            <fpage>772</fpage>
            <lpage>778</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature04979</pubid>
                  <pubid idtype="pmcid">2623244</pubid>
                  <pubid idtype="pmpid" link="fulltext">16862119</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Nucleosome positioning signals in genomic DNA</p>
            </title>
            <aug>
               <au>
                  <snm>Peckham</snm>
                  <fnm>HE</fnm>
               </au>
               <au>
                  <snm>Thurman</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>Fu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Stamatoyannopoulos</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Noble</snm>
                  <fnm>WS</fnm>
               </au>
               <au>
                  <snm>Struhl</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Weng</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2007</pubdate>
            <volume>17</volume>
            <issue>8</issue>
            <fpage>1170</fpage>
            <lpage>1177</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.6101007</pubid>
                  <pubid idtype="pmcid">1933512</pubid>
                  <pubid idtype="pmpid" link="fulltext">17620451</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Prediction of Mammalian MicroRNA Targets</p>
            </title>
            <aug>
               <au>
                  <snm>Lewis</snm>
                  <fnm>BP</fnm>
               </au>
               <au>
                  <snm>hung Shih</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Jones-Rhoades</snm>
                  <fnm>MW</fnm>
               </au>
               <au>
                  <snm>Bartel</snm>
                  <fnm>DP</fnm>
               </au>
               <au>
                  <snm>Burge</snm>
                  <fnm>CB</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2003</pubdate>
            <volume>115</volume>
            <issue>7</issue>
            <fpage>787</fpage>
            <lpage>798</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0092-8674(03)01018-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">14697198</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>DIANA-microT web server: elucidating microRNA functions through target prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Maragkakis</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Reczko</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Simossis</snm>
                  <fnm>VA</fnm>
               </au>
               <au>
                  <snm>Alexiou</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Papadopoulos</snm>
                  <fnm>GL</fnm>
               </au>
               <au>
                  <snm>Dalamagas</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Giannopoulos</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Goumas</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Koukis</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Kourtis</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Vergoulis</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Koziris</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Sellis</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Tsanakas</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Hatzigeorgiou</snm>
                  <fnm>AG</fnm>
               </au>
            </aug>
            <source>Nucl Acids Res</source>
            <pubdate>2009</pubdate>
            <volume>37</volume>
            <issue>suppl 2</issue>
            <fpage>W273</fpage>
            <lpage>276</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkp292</pubid>
                  <pubid idtype="pmcid">2703977</pubid>
                  <pubid idtype="pmpid" link="fulltext">19406924</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Analysis of the vertebrate insulator protein CTCF-binding sites in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Kim</snm>
                  <fnm>TH</fnm>
               </au>
               <au>
                  <snm>Abdullaev</snm>
                  <fnm>ZK</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>AD</fnm>
               </au>
               <au>
                  <snm>Ching</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Loukinov</snm>
                  <fnm>DI</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>MQ</fnm>
               </au>
               <au>
                  <snm>Lobanenkov</snm>
                  <fnm>VV</fnm>
               </au>
               <au>
                  <snm>Ren</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2007</pubdate>
            <volume>128</volume>
            <issue>6</issue>
            <fpage>1231</fpage>
            <lpage>1245</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cell.2006.12.048</pubid>
                  <pubid idtype="pmcid">2572726</pubid>
                  <pubid idtype="pmpid" link="fulltext">17382889</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Computer methods to locate signals in nucleic acid sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Staden</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>NAR</source>
            <pubdate>1984</pubdate>
            <volume>12</volume>
            <fpage>505</fpage>
            <lpage>519</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/12.1Part2.505</pubid>
                  <pubid idtype="pmcid">321067</pubid>
                  <pubid idtype="pmpid" link="fulltext">6364039</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Use of the 'perceptron' algorithm to distinguish translational initiation sites</p>
            </title>
            <aug>
               <au>
                  <snm>Stormo</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Schneider</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Gold</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Ehrenfeucht</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>NAR</source>
            <pubdate>1982</pubdate>
            <volume>10</volume>
            <fpage>2997</fpage>
            <lpage>3010</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/10.9.2997</pubid>
                  <pubid idtype="pmcid">320670</pubid>
                  <pubid idtype="pmpid" link="fulltext">7048259</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>A weight array method for splicing signal analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Marr</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Comput Appl Biosci</source>
            <pubdate>1993</pubdate>
            <volume>9</volume>
            <issue>5</issue>
            <fpage>499</fpage>
            <lpage>509</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8293321</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Discriminatively Trained Markov Model for Sequence Classification</p>
            </title>
            <aug>
               <au>
                  <snm>Yakhnenko</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Silvescu</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Honavar</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>ICDM '05: Proceedings of the Fifth IEEE International Conference on Data Mining, Washington, DC, USA: IEEE Computer Society</source>
            <pubdate>2005</pubdate>
            <fpage>498</fpage>
            <lpage>505</lpage>
            <xrefbib>
               <pubid idtype="doi">full_text</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Recognition of splice sites using maximum conditional likelihood</p>
            </title>
            <aug>
               <au>
                  <snm>Keilwagen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Grau</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Posch</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Grosse</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>LWA: Lernen - Wissen - Abstraktion</source>
            <editor>Hinneburg A</editor>
            <pubdate>2007</pubdate>
            <fpage>67</fpage>
            <lpage>72</lpage>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Modeling splice sites with Bayes networks</p>
            </title>
            <aug>
               <au>
                  <snm>Cai</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Delcher</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kao</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Kasif</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <issue>2</issue>
            <fpage>152</fpage>
            <lpage>158</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/16.2.152</pubid>
                  <pubid idtype="pmpid" link="fulltext">10842737</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Identification of transcription factor binding sites with variable-order Bayesian networks</p>
            </title>
            <aug>
               <au>
                  <snm>Ben-Gal</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Shani</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gohr</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Grau</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Arviv</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Shmilovici</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Posch</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Grosse</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>11</issue>
            <fpage>2657</fpage>
            <lpage>2666</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti410</pubid>
                  <pubid idtype="pmpid" link="fulltext">15797905</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Gene Prediction with Conditional Random Fields</p>
            </title>
            <aug>
               <au>
                  <snm>Culotta</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kulp</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>McCallum</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Tech Rep Technical Report UM-CS-2005-028</source>
            <publisher>University of Massachusetts, Amherst</publisher>
            <pubdate>2005</pubdate>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Global discriminative learning for higher-accuracy computational gene prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Bernal</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Crammer</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Hatzigeorgiou</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Pereira</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>PLoS Comput Biol</source>
            <pubdate>2007</pubdate>
            <volume>3</volume>
            <issue>3</issue>
            <fpage>e54</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1371/journal.pcbi.0030054</pubid>
                  <pubid idtype="pmcid">1828702,1828702</pubid>
                  <pubid idtype="pmpid" link="fulltext">17367206</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Assessing computational tools for the discovery of transcription factor binding sites</p>
            </title>
            <aug>
               <au>
                  <snm>Tompa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Bailey</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Church</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>De Moor</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Eskin</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Favorov</snm>
                  <fnm>AV</fnm>
               </au>
               <au>
                  <snm>Frith</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Fu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Makeev</snm>
                  <fnm>VJ</fnm>
               </au>
               <au>
                  <snm>Mironov</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Noble</snm>
                  <fnm>WS</fnm>
               </au>
               <au>
                  <snm>Pavesi</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Pesole</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>R&#233;gnier</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Simonis</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Sinha</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Thijs</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>van Helden</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Vandenbogaert</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Weng</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Workman</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ye</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Nature Biotechnology</source>
            <pubdate>2005</pubdate>
            <volume>23</volume>
            <fpage>137</fpage>
            <lpage>144</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nbt1053</pubid>
                  <pubid idtype="pmpid" link="fulltext">15637633</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes</p>
            </title>
            <aug>
               <au>
                  <snm>Ng</snm>
                  <fnm>AY</fnm>
               </au>
               <au>
                  <snm>Jordan</snm>
                  <fnm>MI</fnm>
               </au>
            </aug>
            <source>Advances in Neural Information Processing Systems</source>
            <publisher>Cambridge, MA: MIT Press</publisher>
            <editor>Dietterich T, Becker S, Ghahramani Z</editor>
            <pubdate>2002</pubdate>
            <volume>14</volume>
            <fpage>605</fpage>
            <lpage>610</lpage>
            <url>http://citeseer.ist.psu.edu/542917.html</url>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers</p>
            </title>
            <aug>
               <au>
                  <snm>Greiner</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Su</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Shen</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Machine Learning Journal</source>
            <pubdate>2005</pubdate>
            <volume>59</volume>
            <issue>3</issue>
            <fpage>297</fpage>
            <lpage>322</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1007/s10994-005-0469-0</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Discriminative versus generative parameter and structure learning of Bayesian network classifiers</p>
            </title>
            <aug>
               <au>
                  <snm>Pernkopf</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Bilmes</snm>
                  <fnm>JA</fnm>
               </au>
            </aug>
            <source>Proceedings of the 22nd International Conference on Machine Learning</source>
            <pubdate>2005</pubdate>
            <fpage>657</fpage>
            <lpage>664</lpage>
            <xrefbib>
               <pubid idtype="doi">full_text</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Supervised posteriors for DNA-motif classification</p>
            </title>
            <aug>
               <au>
                  <snm>Grau</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Keilwagen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kel</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Grosse</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Posch</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>German Conference on Bioinformatics, Lecture Notes in Informatics (LNI) - Proceedings</source>
            <publisher>Gesellschaft f&#252;r Informatik (GI)</publisher>
            <editor>Falter C, Schliep A, Selbig J, Vingron M, Walter D</editor>
            <pubdate>2007</pubdate>
            <fpage>123</fpage>
            <lpage>134</lpage>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Apples and oranges: avoiding different priors in Bayesian DNA sequence analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Keilwagen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Grau</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Posch</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Grosse</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2009</pubdate>
            <inpress/>
         </bibl>
         <bibl id="B28">
            <title>
               <p>On the Mathematical Foundations of Theoretical Statistics</p>
            </title>
            <aug>
               <au>
                  <snm>Fisher</snm>
                  <fnm>RA</fnm>
               </au>
            </aug>
            <pubdate>1922</pubdate>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1354066</pubid>
                  <pubid idtype="pmpid">18010665</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>R. A. Fisher and the Making of Maximum Likelihood 1912-1922</p>
            </title>
            <aug>
               <au>
                  <snm>Aldrich</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Statistical Science</source>
            <pubdate>1997</pubdate>
            <volume>12</volume>
            <issue>3</issue>
            <fpage>162</fpage>
            <lpage>176</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1214/ss/1030037906</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <aug>
               <au>
                  <snm>Bishop</snm>
                  <fnm>CM</fnm>
               </au>
            </aug>
            <source>Pattern Recognition and Machine Learning</source>
            <publisher>Springer</publisher>
            <pubdate>2006</pubdate>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Discriminative motif discovery in DNA and protein sequences using the DEME algorithm</p>
            </title>
            <aug>
               <au>
                  <snm>Redhead</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Bailey</snm>
                  <fnm>TL</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>385</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-8-385</pubid>
                  <pubid idtype="pmcid">2194741</pubid>
                  <pubid idtype="pmpid" link="fulltext">17937785</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>On Supervised Learning of Bayesian Network Parameters</p>
            </title>
            <aug>
               <au>
                  <snm>Wettig</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Gr&#252;nwald</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Roos</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Myllym&#228;ki</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Tirri</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Tech Rep HIIT Technical Report 2002-1</source>
            <publisher>Helsinki Institute for Information Technology HIIT</publisher>
            <pubdate>2002</pubdate>
            <url>http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.3.9589</url>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Learning Bayesian network classifiers by maximizing conditional likelihood</p>
            </title>
            <aug>
               <au>
                  <snm>Grossman</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Domingos</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>ICML</source>
            <publisher>ACM Press</publisher>
            <pubdate>2004</pubdate>
            <fpage>361</fpage>
            <lpage>368</lpage>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Discriminative Scoring of Bayesian Network Classifiers: a Comparative Study</p>
            </title>
            <aug>
               <au>
                  <snm>Feelders</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ivanovs</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Proceedings of the third European workshop on probabilistic graphical models</source>
            <pubdate>2006</pubdate>
            <fpage>75</fpage>
            <lpage>82</lpage>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Supervised posterior distributions</p>
            </title>
            <aug>
               <au>
                  <snm>Gr&#252;nwald</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kontkanen</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Myllym&#228;ki</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Roos</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Tirri</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Wettig</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Presented at the Seventh Valencia International Meeting on Bayesian Statistics</source>
            <pubdate>2002</pubdate>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Robust Bayesian Linear Classifier Ensembles</p>
            </title>
            <aug>
               <au>
                  <snm>Cerquides</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>de M&#225;ntaras</snm>
                  <fnm>RL</fnm>
               </au>
            </aug>
            <source>ECML</source>
            <pubdate>2005</pubdate>
            <fpage>72</fpage>
            <lpage>83</lpage>
         </bibl>
         <bibl id="B37">
            <title>
               <p>The Tradeoff Between Generative and Discriminative Classifiers</p>
            </title>
            <aug>
               <au>
                  <snm>Bouchard</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Triggs</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>IASC International Symposium on Computational Statistics (COMPSTAT), Prague</source>
            <pubdate>2004</pubdate>
            <fpage>721</fpage>
            <lpage>728</lpage>
            <url>http://lear.inrialpes.fr/pubs/2004/BT04</url>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Classification with Hybrid Generative/Discriminative Models</p>
            </title>
            <aug>
               <au>
                  <snm>Raina</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Shen</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Ng</snm>
                  <fnm>AY</fnm>
               </au>
               <au>
                  <snm>McCallum</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Advances in Neural Information Processing Systems 16</source>
            <publisher>Cambridge, MA: MIT Press</publisher>
            <editor>Thrun S, Saul L, Sch&#246;lkopf B</editor>
            <pubdate>2004</pubdate>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Principled Hybrids of Generative and Discriminative Models</p>
            </title>
            <aug>
               <au>
                  <snm>Lasserre</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Bishop</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Minka</snm>
                  <fnm>TP</fnm>
               </au>
            </aug>
            <source>Proc IEEE Computer Society Conference on Computer Vision and Pattern Recognition</source>
            <pubdate>2006</pubdate>
            <volume>1</volume>
            <fpage>87</fpage>
            <lpage>94</lpage>
            <url>http://research.microsoft.com/en-us/um/people/cmbishop/downloads/bishop-cvpr-06.pdf</url>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Multi-conditional learning: Generative/discriminative training for clustering and classification</p>
            </title>
            <aug>
               <au>
                  <snm>Mccallum</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Pal</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Druck</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>X</fnm>
               </au>
            </aug>
            <source>NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE</source>
            <pubdate>2006</pubdate>
            <fpage>433</fpage>
            <lpage>439</lpage>
            <url>http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.67.5681</url>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Bias-Variance Tradeoff in Hybrid Generative-Discriminative Models</p>
            </title>
            <aug>
               <au>
                  <snm>Bouchard</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>ICMLA '07: Proceedings of the Sixth International Conference on Machine Learning and Applications</source>
            <publisher>Washington, DC, USA: IEEE Computer Society</publisher>
            <pubdate>2007</pubdate>
            <fpage>124</fpage>
            <lpage>129</lpage>
            <xrefbib>
               <pubid idtype="doi">full_text</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Interpretation of hybrid generative/discriminative algorithms</p>
            </title>
            <aug>
               <au>
                  <snm>Xue</snm>
                  <fnm>JH</fnm>
               </au>
               <au>
                  <snm>Titterington</snm>
                  <fnm>DM</fnm>
               </au>
            </aug>
            <source>Neurocomputing</source>
            <pubdate>2009</pubdate>
            <volume>72</volume>
            <issue>7-9</issue>
            <fpage>1648</fpage>
            <lpage>1655</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/j.neucom.2008.08.009</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <aug>
               <au>
                  <snm>Hastie</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Friedman</snm>
                  <fnm>JH</fnm>
               </au>
            </aug>
            <source>The elements of statistical learning: data mining, inference, and prediction</source>
            <publisher>Springer</publisher>
            <pubdate>2009</pubdate>
            <url>http://www-stat.stanford.edu/~hastie/Papers/ESLII.pdf</url>
         </bibl>
         <bibl id="B44">
            <title>
               <p>TRANSFAC: A database on transcription factors and their DNA binding sites</p>
            </title>
            <aug>
               <au>
                  <snm>Wingender</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Dietze</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Karas</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Knuppel</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1996</pubdate>
            <volume>24</volume>
            <fpage>238</fpage>
            <lpage>241</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/24.1.238</pubid>
                  <pubid idtype="pmcid">145586</pubid>
                  <pubid idtype="pmpid" link="fulltext">8594589</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>A Java framework for statistical analysis and classification of biological sequences</p>
            </title>
            <url>http://www.jstacs.de/</url>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Efficient Training of Conditional Random Fields</p>
            </title>
            <aug>
               <au>
                  <snm>Wallach</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Master's thesis</source>
            <publisher>University of Edinburgh</publisher>
            <pubdate>2002</pubdate>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Jstacs Projects: GenDisMix</p>
            </title>
            <url>http://www.jstacs.de/index.php/GenDisMix</url>
         </bibl>
      </refgrp>
   </bm>
</art>

