<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art><ui>1471-2105-11-309</ui><ji>1471-2105</ji><fm>
<dochead>Methodology article</dochead>
<bibl>
<title>
<p>L<sub>2</sub>-norm multiple kernel learning and its application to biomedical data fusion</p>
</title>
<aug>
<au ca="yes" id="A1"><snm>Yu</snm><fnm>Shi</fnm><insr iid="I1"/><email>shee.yu@gmail.com</email></au>
<au id="A2"><snm>Falck</snm><fnm>Tillmann</fnm><insr iid="I2"/><email>tillmann.falck@esat.kuleuven.be</email></au>
<au id="A3"><snm>Daemen</snm><fnm>Anneleen</fnm><insr iid="I1"/><email>Anneleen.Daemen@esat.kuleuven.be</email></au>
<au id="A4"><snm>Tranchevent</snm><fnm>Leon-Charles</fnm><insr iid="I1"/><email>leon-charles.tranchevent@esat.kuleuven.be</email></au>
<au id="A5"><snm>Suykens</snm><mi>AK</mi><fnm>Johan</fnm><insr iid="I2"/><email>johan.suykens@esat.kuleuven.be</email></au>
<au id="A6"><snm>De Moor</snm><fnm>Bart</fnm><insr iid="I1"/><email>bart.demoor@esat.kuleuven.be</email></au>
<au id="A7"><snm>Moreau</snm><fnm>Yves</fnm><insr iid="I1"/><email>yves.moreu@esat.kuleuven.be</email></au>
</aug>
<insg>
<ins id="I1"><p>Bioinformatics Group, Department of Electrical Engineering, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Heverlee B-3001, Belgium</p></ins>
<ins id="I2"><p>Systems, Models and Control Group, Department of Electrical Engineering, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Heverlee B-3001, Belgium</p></ins>
</insg>
<source>BMC Bioinformatics</source>
<issn>1471-2105</issn>
<pubdate>2010</pubdate>
<volume>11</volume>
<issue>1</issue>
<fpage>309</fpage>
<url>http://www.biomedcentral.com/1471-2105/11/309</url>
<xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-11-309</pubid><pubid idtype="pmpid">20529363</pubid></pubidlist></xrefbib>
</bibl>
<history><rec><date><day>14</day><month>1</month><year>2010</year></date></rec><acc><date><day>8</day><month>6</month><year>2010</year></date></acc><pub><date><day>8</day><month>6</month><year>2010</year></date></pub></history>
<cpyrt><year>2010</year><collab>Yu et al; licensee BioMed Central Ltd.</collab><note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
<abs>
<sec>
<st>
<p>Abstract</p>
</st>
<sec>
<st>
<p>Background</p>
</st>
<p>This paper introduces the notion of optimizing different norms in the dual problem of support vector machines with multiple kernels. The selection of norms yields different extensions of multiple kernel learning (MKL) such as <it>L</it>
<sub>&#8734;</sub>, <it>L</it>
<sub>1</sub>, and <it>L</it>
<sub>2 </sub>MKL. In particular, <it>L</it>
<sub>2 </sub>MKL is a novel method that leads to non-sparse optimal kernel coefficients, which is different from the sparse kernel coefficients optimized by the existing <it>L</it>
<sub>&#8734; </sub>MKL method. In real biomedical applications, <it>L</it>
<sub>2 </sub>MKL may have more advantages over sparse integration method for thoroughly combining complementary information in heterogeneous data sources.</p>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<p>We provide a theoretical analysis of the relationship between the <it>L</it>
<sub>2 </sub>optimization of kernels in the dual problem with the <it>L</it>
<sub>2 </sub>coefficient regularization in the primal problem. Understanding the dual <it>L</it>
<sub>2 </sub>problem grants a unified view on MKL and enables us to extend the <it>L</it>
<sub>2 </sub>method to a wide range of machine learning problems. We implement <it>L</it>
<sub>2 </sub>MKL for ranking and classification problems and compare its performance with the sparse <it>L</it>
<sub>&#8734; </sub>and the averaging <it>L</it>
<sub>1 </sub>MKL methods. The experiments are carried out on six real biomedical data sets and two large scale UCI data sets. <it>L</it>
<sub>2 </sub>MKL yields better performance on most of the benchmark data sets. In particular, we propose a novel <it>L</it>
<sub>2 </sub>MKL least squares support vector machine (LSSVM) algorithm, which is shown to be an efficient and promising classifier for large scale data sets processing.</p>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>This paper extends the statistical framework of genomic data fusion based on MKL. Allowing non-sparse weights on the data sources is an attractive option in settings where we believe most data sources to be relevant to the problem at hand and want to avoid a "winner-takes-all" effect seen in <it>L</it>
<sub>&#8734; </sub>MKL, which can be detrimental to the performance in prospective studies. The notion of optimizing <it>L</it>
<sub>2 </sub>kernels can be straightforwardly extended to ranking, classification, regression, and clustering algorithms. To tackle the computational burden of MKL, this paper proposes several novel LSSVM based MKL algorithms. Systematic comparison on real data sets shows that LSSVM MKL has comparable performance as the conventional SVM MKL algorithms. Moreover, large scale numerical experiments indicate that when cast as semi-infinite programming, LSSVM MKL can be solved more efficiently than SVM MKL.</p>
</sec>
<sec>
<st>
<p>Availability</p>
</st>
<p>The MATLAB code of algorithms implemented in this paper is downloadable from <url>http://homes.esat.kuleuven.be/~sistawww/bioi/syu/l2lssvm.html</url>.</p>
</sec>
</sec>
</abs>
</fm><meta>
<classifications>
<classification id="endnote" subtype="user_supplied_xml" type="bmc"/>
</classifications>
</meta><bdy>
<sec>
<st>
<p>Background</p>
</st>
<p>In the era of information overflow, data mining and machine learning are indispensable tools to retrieve information and knowledge from data. The idea of incorporating several data sources in analysis may be beneficial by reducing the noise, as well as by improving statistical significance and leveraging the interactions and correlations between data sources to obtain more refined and higher-level information <abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp>, which is known as <it>data fusion</it>. In bioinformatics, considerable effort has been devoted to <it>genomic data fusion</it>, which is an emerging topic pertaining to a lot of applications. At present, terabytes of data are generated by high-throughput techniques at an increasing rate. In data fusion, these terabytes are further multiplied by the number of data sources or the number of species. A statistical model describing this data is therefore not an easy matter. To tackle this challenge, it is rather effective to consider the data as being generated by a complex and unknown black box with the goal of finding a function or an algorithm that operates on an input to predict the output. About 15 years ago, Vapnik <abbrgrp>
<abbr bid="B2">2</abbr>
</abbrgrp> introduced the support vector method which makes use of kernel functions. This method has offered plenty of opportunities to solve complicated problems but also brought lots of interdisciplinary challenges in statistics, optimization theory, and the applications therein <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp>.</p>
<p>Multiple kernel learning (MKL) has been pioneered by Lanckriet <it>et al</it>. <abbrgrp>
<abbr bid="B4">4</abbr>
</abbrgrp> and Bach <it>et al</it>. <abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp> as an additive extension of single kernel SVM to incorporate multiple kernels in classification. It has also been applied as a statistical learning framework for genomic data fusion <abbrgrp>
<abbr bid="B6">6</abbr>
</abbrgrp> and many other applications <abbrgrp>
<abbr bid="B7">7</abbr>
</abbrgrp>. The essence of MKL, which is the additive extension of the dual problem, relies only on the kernel representation (kernel trick) while the heterogeneities of data sources are resolved by transforming different data structures (i.e., vectors, strings, trees, graphs, etc.) into kernel matrices. In the dual problem, these kernels are combined into a single kernel, moreover, the coefficients of the kernels are leveraged adaptively to optimize the algorithmic objective, known as <it>kernel fusion</it>. The notion of kernel fusion was originally proposed to solve classification problems in computational biology, but recent efforts have lead to analogous solutions for one class <abbrgrp>
<abbr bid="B7">7</abbr>
</abbrgrp> and unsupervised learning problems (Yu <it>et al</it>.: Optimized data fusion for kernel K-means clustering, submitted). Currently, most of the existing MKL methods are based on the formulation proposed by Lanckriet <it>et al</it>. <abbrgrp>
<abbr bid="B4">4</abbr>
</abbrgrp>, which is clarified in our paper as the optimization of the infinity norm (<it>L</it>
<sub>&#8734;</sub>) of kernel fusion. Optimizing <it>L</it>
<sub>&#8734; </sub>MKL in the dual problem corresponds to posing <it>L</it>
<sub>1 </sub>regularization on the kernel coefficients in the primal problem. As known, <it>L</it>
<sub>1 </sub>regularization is characterized by the sparseness of the kernel coefficients <abbrgrp>
<abbr bid="B8">8</abbr>
</abbrgrp>. Thus, the solution obtained by <it>L</it>
<sub>&#8734; </sub>MKL is also sparse, which assigns dominant coefficients to only one or two kernels. The sparseness is useful to distinguish relevant sources from a large number of irrelevant data sources. However, in biomedical applications, there are usually a small number of sources and most of these data sources are carefully selected and preprocessed. They thus often are directly relevant to the problem. In these cases, a sparse solution may be too selective to thoroughly combine the complementary information in the data sources. While the performance on benchmark data may be good, the selected sources may not be as strong on truly novel problems where the quality of the information is much lower. We may thus expect the performance of such solutions to degrade significantly on actual real-world applications. To address this problem, we propose a new kernel fusion scheme by optimizing the <it>L</it>
<sub>2</sub>-norm of multiple kernels. The <it>L</it>
<sub>2 </sub>MKL yields a non-sparse solution, which smoothly distributes the coefficients on multiple kernels and, at the same time, leverages the effects of kernels in the objective optimization. Empirical results show that the <it>L</it>
<sub>2</sub>-norm kernel fusion can lead to a better performance in biomedical data fusion.</p>
</sec>
<sec>
<st>
<p>Methods</p>
</st>
<sec>
<st>
<p>Acronyms</p>
</st>
<p>The symbols and notations used in this paper are defined in Table <tblr tid="T1">1</tblr> (in the order of appearance).</p>
<tbl id="T1"><title><p>Table 1</p></title><caption><p>Acronyms</p></caption><tblbdy cols="3">
      <r>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i1.gif"/>
               </inline-formula>
            </p>
         </c>
         <c ca="left">
            <p>&#8477;<sup><it>N</it></sup></p>
         </c>
         <c ca="left">
            <p>the dual variable of SVM</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>Q</it>
            </p>
         </c>
         <c ca="left">
            <p>&#8477;<sup><it>N </it>&#215; <it>N</it></sup></p>
         </c>
         <c ca="left">
            <p>a semi-positive definite matrix</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>C</it>
            </p>
         </c>
         <c ca="left">
            <p>&#8477;<sup><it>N</it></sup></p>
         </c>
         <c ca="left">
            <p>a convex set</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>&#937;</p>
         </c>
         <c ca="left">
            <p>&#8477;<sup><it>N </it>&#215; <it>N</it></sup></p>
         </c>
         <c ca="left">
            <p>a combination of multiple semi-positive definite matrices</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>j</it>
            </p>
         </c>
         <c ca="left">
            <p>&#8469;</p>
         </c>
         <c ca="left">
            <p>the index of kernel matrices</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>p</it>
            </p>
         </c>
         <c ca="left">
            <p>&#8469;</p>
         </c>
         <c ca="left">
            <p>the number of kernel matrices</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>&#952;</it>
            </p>
         </c>
         <c ca="left">
            <p>[0, 1]</p>
         </c>
         <c ca="left">
            <p>coefficients of kernel matrices</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>t</it>
            </p>
         </c>
         <c ca="left">
            <p>[0, + &#8734;)</p>
         </c>
         <c ca="left">
            <p>dummy variable in optimization problem</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i2.gif"/>
               </inline-formula>
            </p>
         </c>
         <c ca="left">
            <p>&#8477;<sup><it>p</it></sup></p>
         </c>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i3.gif"/>
               </inline-formula>
            </p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i4.gif"/>
               </inline-formula>
            </p>
         </c>
         <c ca="left">
            <p>&#8477;<sup><it>p</it></sup></p>
         </c>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i5.gif"/>
               </inline-formula>
            </p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i6.gif"/>
               </inline-formula>
            </p>
         </c>
         <c ca="left">
            <p>&#8477;<sup><it>D </it></sup>or &#8477;<sup>&#934;</sup></p>
         </c>
         <c ca="left">
            <p>the norm vector of the separating hyperplane</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>&#981;</it>(&#183;)</p>
         </c>
         <c ca="left">
            <p>&#8477;<sup><it>D </it></sup>&#8594; &#8477;<sup>&#934;</sup></p>
         </c>
         <c ca="left">
            <p>the feature map</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>i</it>
            </p>
         </c>
         <c ca="left">
            <p>&#8469;</p>
         </c>
         <c ca="left">
            <p>the index of training samples</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i7.gif"/>
               </inline-formula>
            </p>
         </c>
         <c ca="left">
            <p>&#8477;<sup><it>D</it></sup></p>
         </c>
         <c ca="left">
            <p>the vector of the <it>i</it>-th training sample</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>&#961;</it>
            </p>
         </c>
         <c ca="left">
            <p>&#8477;</p>
         </c>
         <c ca="left">
            <p>bias term in 1-SVM</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>&#957;</it>
            </p>
         </c>
         <c ca="left">
            <p>&#8477;<sup>+</sup></p>
         </c>
         <c ca="left">
            <p>regularization term of 1-SVM</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>&#958;</it>
               <sub>
                  <it>i</it>
               </sub>
            </p>
         </c>
         <c ca="left">
            <p>&#8477;</p>
         </c>
         <c ca="left">
            <p>slack variable for the <it>i</it>-th training sample</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>K</it>
            </p>
         </c>
         <c ca="left">
            <p>&#8477;<sup><it>N </it>&#215; <it>N</it></sup></p>
         </c>
         <c ca="left">
            <p>kernel matrix</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i8.gif"/>
               </inline-formula>
            </p>
         </c>
         <c ca="left">
            <p>&#8477;<sup><it>D </it></sup>&#215; &#8477;<sup><it>D </it></sup>&#8594; &#8477;</p>
         </c>
         <c ca="left">
            <p>kernel function, <inline-formula><graphic file="1471-2105-11-309-i9.gif"/></inline-formula></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i10.gif"/>
               </inline-formula>
            </p>
         </c>
         <c ca="left">
            <p>&#8477;<sup><it>D</it></sup></p>
         </c>
         <c ca="left">
            <p>the vector of a test data sample</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>y</it>
               <sub>
                  <it>i</it>
               </sub>
            </p>
         </c>
         <c ca="left">
            <p>-1 or +1</p>
         </c>
         <c ca="left">
            <p>the class label of the <it>i</it>-th training sample</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>Y</it>
            </p>
         </c>
         <c ca="left">
            <p>&#8477;<sup><it>N </it>&#215; <it>N</it></sup></p>
         </c>
         <c ca="left">
            <p>the diagonal matrix of class labels <it>Y </it>= <it>diag</it>(<it>y</it><sub>1</sub>, ..., <it>y</it><sub><it>N</it></sub>)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>C</it>
            </p>
         </c>
         <c ca="left">
            <p>&#8477;<sup>+</sup></p>
         </c>
         <c ca="left">
            <p>the box constraint on dual variables of SVM</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>b</it>
            </p>
         </c>
         <c ca="left">
            <p>&#8477;<sup>+</sup></p>
         </c>
         <c ca="left">
            <p>the bias term in SVM and LSSVM</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i11.gif"/>
               </inline-formula>
            </p>
         </c>
         <c ca="left">
            <p>&#8477;<sup><it>p</it></sup></p>
         </c>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i12.gif"/>
               </inline-formula>
            </p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>k</it>
            </p>
         </c>
         <c ca="left">
            <p>&#8469;</p>
         </c>
         <c ca="left">
            <p>the number of classes</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i13.gif"/>
               </inline-formula>
            </p>
         </c>
         <c ca="left">
            <p>&#8477;<sup><it>p</it></sup></p>
         </c>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i14.gif"/>
               </inline-formula>
            </p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i15.gif"/>
               </inline-formula>
            </p>
         </c>
         <c ca="left">
            <p>&#8477;<sup><it>p</it></sup></p>
         </c>
         <c ca="left">
            <p>variable vector in SIP problem</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>u</it>
            </p>
         </c>
         <c ca="left">
            <p>&#8477;</p>
         </c>
         <c ca="left">
            <p>dummy variable in SIP problem</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>q</it>
            </p>
         </c>
         <c ca="left">
            <p>&#8469;</p>
         </c>
         <c ca="left">
            <p>the index of class number in classification problem, <it>q </it>= 1, ..., <it>k</it></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>A</it>
            </p>
         </c>
         <c ca="left">
            <p>&#8477;<sup><it>N </it>&#215; <it>N</it></sup></p>
         </c>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i16.gif"/>
               </inline-formula>
            </p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>&#955;</p>
         </c>
         <c ca="left">
            <p>&#8477;<sup>+</sup></p>
         </c>
         <c ca="left">
            <p>the regularization parameter in LSSVM</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>e</it>
               <sub>
                  <it>i</it>
               </sub>
            </p>
         </c>
         <c ca="left">
            <p>&#8477;</p>
         </c>
         <c ca="left">
            <p>the error term of the <it>i</it>-th sample in LSSVM</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i17.gif"/>
               </inline-formula>
            </p>
         </c>
         <c ca="left">
            <p>&#8477;<sup><it>N</it></sup></p>
         </c>
         <c ca="left">
            <p>the dual variable of LSSVM, <inline-formula><graphic file="1471-2105-11-309-i18.gif"/></inline-formula></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>&#1013;</it>
            </p>
         </c>
         <c ca="left">
            <p>&#8477;<sup>+</sup></p>
         </c>
         <c ca="left">
            <p>precision value as the stopping criterion of SIP iteration</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>&#964;</it>
            </p>
         </c>
         <c ca="left">
            <p>&#8469;</p>
         </c>
         <c ca="left">
            <p>index parameter of SIP iterations</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i19.gif"/>
               </inline-formula>
            </p>
         </c>
         <c ca="left">
            <p>&#8477;<sup><it>p</it></sup></p>
         </c>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i20.gif"/>
               </inline-formula>
            </p>
         </c>
      </r>
   </tblbdy></tbl>
</sec>
<sec>
<st>
<p>Formal definition of the problem</p>
</st>
<p>We consider the problem of minimizing a quadratic cost of a real vector in function of <inline-formula>
<graphic file="1471-2105-11-309-i1.gif"/>
</inline-formula> and a real positive semi-definite (PSD) matrix <it>Q</it>, given by</p>
<p>
<display-formula id="M1">
<graphic file="1471-2105-11-309-i21.gif"/>
</display-formula>
</p>
<p>where <inline-formula>
<graphic file="1471-2105-11-309-i22.gif"/>
</inline-formula> denotes a convex set. Also, PSD implies that <inline-formula>
<graphic file="1471-2105-11-309-i23.gif"/>
</inline-formula>. We will show that many machine learning problems can be cast in form (1) with additional constraints on <inline-formula>
<graphic file="1471-2105-11-309-i1.gif"/>
</inline-formula>. In particular, if we restrict <inline-formula>
<graphic file="1471-2105-11-309-i24.gif"/>
</inline-formula>, the problem in (1) becomes a Rayleigh quotient and leads to the eigenvalue problem. Now we consider a convex parametric linear combination of a set of <it>p </it>PSD matrices <it>Q</it>
<sub>
<it>j</it>
</sub>, given by:</p>
<p>
<display-formula id="M2">
<graphic file="1471-2105-11-309-i25.gif"/>
</display-formula>
</p>
<p>To bound the coefficients <it>&#952;</it>
<sub>
<it>j</it>
</sub>, we restrict that, for example, ||<it>&#952;</it>
<sub>
<it>j</it>
</sub>||<sub>1 </sub>= 1, and (1) can be equivalently rewritten as a min-max problem, given by</p>
<p>
<display-formula id="M3">
<graphic file="1471-2105-11-309-i26.gif"/>
</display-formula>
</p>
<p>To solve (3), we denote <inline-formula>
<graphic file="1471-2105-11-309-i27.gif"/>
</inline-formula>, the min-max problem can be formulated in a form of quadraticly constrained linear program (QCLP), given by</p>
<p>
<display-formula id="M4">
<graphic file="1471-2105-11-309-i28.gif"/>
</display-formula>
</p>
<p>The optimal solution <inline-formula>
<graphic file="1471-2105-11-309-i29.gif"/>
</inline-formula> in (3) is obtained from the dual variable corresponding to the quadratic constraints in (4). The optimal <it>t</it>* is equivalent to the <it>Chebyshev </it>or <it>L</it>
<sub>&#8734;</sub>-norm of the vector of quadratic terms, given by:</p>
<p>
<display-formula id="M5">
<graphic file="1471-2105-11-309-i30.gif"/>
</display-formula>
</p>
<p>The <it>L</it>
<sub>&#8734;</sub>-norm is the upper bound w.r.t. the constraint <inline-formula>
<graphic file="1471-2105-11-309-i31.gif"/>
</inline-formula> because</p>
<p>
<display-formula id="M6">
<graphic file="1471-2105-11-309-i32.gif"/>
</display-formula>
</p>
<p>Apparently, suppose the optimal <inline-formula>
<graphic file="1471-2105-11-309-i33.gif"/>
</inline-formula> is given, optimizing the <it>L</it>
<sub>&#8734;</sub>-norm in (5) will pick the single term with the maximal value, and the optimal solution of the coefficients is more likely to be sparse. An alternative solution to (3) is to introduce a different constraint on the coefficients, for example, ||<it>&#952;</it>
<sub>
<it>j</it>
</sub>||<sub>2 </sub>= 1. We thus propose a new extension of the problem in (1), given by</p>
<p>
<display-formula id="M7">
<graphic file="1471-2105-11-309-i34.gif"/>
</display-formula>
</p>
<p>This new extension is analogously solved as a QCLP problem with modified constraints, given by</p>
<p>
<display-formula id="M8">
<graphic file="1471-2105-11-309-i35.gif"/>
</display-formula>
</p>
<p>where <inline-formula>
<graphic file="1471-2105-11-309-i3.gif"/>
</inline-formula>. The proof that (8) is the solution of (7) is given in the following theorem.</p>
<p>
<b>Theorem 0.1 </b>
<it>The QCLP problem in (8) equivalently solves the problem in (7)</it>.</p>
<p>
<b>Proof </b>Given two vectors {<it>x</it>
<sub>1</sub>, ..., <it>x</it>
<sub>
<it>p</it>
</sub>}, {<it>y</it>
<sub>1</sub>, ..., <it>y</it>
<sub>
<it>p</it>
</sub>}, <it>x</it>
<sub>
<it>j</it>
</sub>, <it>y</it>
<sub>
<it>j </it>
</sub>&#8712; &#8477;, <it>j </it>= 1, ..., <it>p</it>, the Cauchy-Schwarz inequality states that</p>
<p>
<display-formula id="M9">
<graphic file="1471-2105-11-309-i36.gif"/>
</display-formula>
</p>
<p>with as equivalent form:</p>
<p>
<display-formula id="M10">
<graphic file="1471-2105-11-309-i37.gif"/>
</display-formula>
</p>
<p>Let us denote <it>x</it>
<sub>
<it>j </it>
</sub>= <it>&#952;</it>
<sub>
<it>j </it>
</sub>and <inline-formula>
<graphic file="1471-2105-11-309-i38.gif"/>
</inline-formula>, (10) becomes</p>
<p>
<display-formula id="M11">
<graphic file="1471-2105-11-309-i39.gif"/>
</display-formula>
</p>
<p>Since ||<it>&#952;</it>
<sub>
<it>j</it>
</sub>||<sub>2 </sub>= 1, (11) is equivalent to</p>
<p>
<display-formula id="M12">
<graphic file="1471-2105-11-309-i40.gif"/>
</display-formula>
</p>
<p>Therefore, given <inline-formula>
<graphic file="1471-2105-11-309-i3.gif"/>
</inline-formula>, the additive term <inline-formula>
<graphic file="1471-2105-11-309-i41.gif"/>
</inline-formula> is bounded by the <it>L</it>
<sub>2</sub>-norm ||<inline-formula>
<graphic file="1471-2105-11-309-i2.gif"/>
</inline-formula>||<sub>2</sub>.</p>
<p>Moreover, it is easy to prove that when <inline-formula>
<graphic file="1471-2105-11-309-i42.gif"/>
</inline-formula>, the parametric combination reaches the upperbound and the equality holds. Optimizing this <it>L</it>
<sub>2</sub>-norm results in a non-sparse solution in <it>&#952;</it>
<sub>
<it>j</it>
</sub>. In order to distinguish this from the solution obtained by (3) and (4), we denote it as the <it>L</it>
<sub>2</sub>-norm approach. It can also easily be seen (not shown here) that the <it>L</it>
<sub>1</sub>-norm approach is simply averaging the quadratic terms with uniform coefficients.</p>
<p>The <it>L</it>
<sub>2</sub>-norm bound is also generalizable to any positive real number <it>n </it>&#8805; 1, defined as <it>L</it>
<sub>
<it>n</it>
</sub>-norm MKL. Recently, the similar topic is also investigated by <abbrgrp>
<abbr bid="B9">9</abbr>
</abbrgrp> and a solution is proposed to solve the primal MKL problem. In this paper, we will show that our primal-dual interpretation of MKL is also extendable to the <it>n</it>-norm. Let us assume that <inline-formula>
<graphic file="1471-2105-11-309-i43.gif"/>
</inline-formula> is regularized by the <it>L</it>
<sub>
<it>m</it>
</sub>-norm as ||<inline-formula>
<graphic file="1471-2105-11-309-i43.gif"/>
</inline-formula>||<sub>
<it>m </it>
</sub>= 1, then the <it>L</it>
<sub>
<it>m</it>
</sub>-norm extension of equation (7) is given by</p>
<p>
<display-formula id="M13">
<graphic file="1471-2105-11-309-i44.gif"/>
</display-formula>
</p>
<p>In the following theorem, we prove that (13) can be equivalently solved as a QCLP problem, given by</p>
<p>
<display-formula id="M14">
<graphic file="1471-2105-11-309-i45.gif"/>
</display-formula>
</p>
<p>where <inline-formula>
<graphic file="1471-2105-11-309-i3.gif"/>
</inline-formula> and the constraint is in <it>L</it>
<sub>
<it>n</it>
</sub>-norm, moreover, <inline-formula>
<graphic file="1471-2105-11-309-i46.gif"/>
</inline-formula>. The problem in (14) is convex and can be solved by cvx toolbox <abbrgrp>
<abbr bid="B10">10</abbr>
<abbr bid="B11">11</abbr>
</abbrgrp>.</p>
<p>
<b>Theorem 0.2 </b>
<it>If the coefficient vector </it>
<inline-formula>
<graphic file="1471-2105-11-309-i43.gif"/>
</inline-formula>
<it>is regularized by a L</it>
<sub>
<it>m</it>
</sub>
<it>-norm in (13), the problem can be solved as a convex programming problem in (14) with L</it>
<sub>
<it>n</it>
</sub>
<it>-norm constraint. Moreover</it>, <inline-formula>
<graphic file="1471-2105-11-309-i46.gif"/>
</inline-formula>.</p>
<p>
<b>Proof </b>We generalize the Cauchy-Schwarz inequality to H&#246;lder's inequality. Let <it>m</it>, <it>n </it>&gt; 1 be two numbers that satisfy <inline-formula>
<graphic file="1471-2105-11-309-i47.gif"/>
</inline-formula>. Then</p>
<p>
<display-formula id="M15">
<graphic file="1471-2105-11-309-i48.gif"/>
</display-formula>
</p>
<p>Let us denote <it>x</it>
<sub>
<it>j </it>
</sub>= <it>&#952;</it>
<sub>
<it>j </it>
</sub>and <inline-formula>
<graphic file="1471-2105-11-309-i38.gif"/>
</inline-formula>, (2) becomes</p>
<p>
<display-formula id="M16">
<graphic file="1471-2105-11-309-i49.gif"/>
</display-formula>
</p>
<p>Since ||<inline-formula>
<graphic file="1471-2105-11-309-i43.gif"/>
</inline-formula>||<sub>
<it>m </it>
</sub>= 1, therefore the term <inline-formula>
<graphic file="1471-2105-11-309-i50.gif"/>
</inline-formula> can be omitted in the equation, so (3) is equivalent to</p>
<p>
<display-formula id="M17">
<graphic file="1471-2105-11-309-i51.gif"/>
</display-formula>
</p>
<p>Due to the condition that <inline-formula>
<graphic file="1471-2105-11-309-i47.gif"/>
</inline-formula>, so <inline-formula>
<graphic file="1471-2105-11-309-i46.gif"/>
</inline-formula>, we prove that with the <it>L</it>
<sub>
<it>m</it>
</sub>-norm constraint posed on <inline-formula>
<graphic file="1471-2105-11-309-i43.gif"/>
</inline-formula>, the additive multiple kernel term <inline-formula>
<graphic file="1471-2105-11-309-i41.gif"/>
</inline-formula> is bounded by the <it>L</it>
<sub>
<it>n</it>
</sub>-norm of the vector <inline-formula>
<graphic file="1471-2105-11-309-i52.gif"/>
</inline-formula>. Moreover, we have <inline-formula>
<graphic file="1471-2105-11-309-i46.gif"/>
</inline-formula>.</p>
<p>In this section, we have explained the <it>L</it>
<sub>&#8734;</sub>, <it>L</it>
<sub>1</sub>, <it>L</it>
<sub>2</sub>, and <it>L</it>
<sub>
<it>n</it>
</sub>-norm approaches to extend the basic problem in (1) to multiple matrices <it>Q</it>
<sub>
<it>j</it>
</sub>. These approaches differed mainly on the constraints applied on the coefficients. To clarify the difference of notations used in this paper with the common interpretations of <it>L</it>
<sub>1 </sub>and <it>L</it>
<sub>2 </sub>regularization on <inline-formula>
<graphic file="1471-2105-11-309-i43.gif"/>
</inline-formula>, we illustrate the mapping of our <it>L</it>
<sub>&#8734;</sub>, <it>L</it>
<sub>1</sub>, <it>L</it>
<sub>2</sub>, and <it>L</it>
<sub>
<it>n </it>
</sub>notations between the common interpretations of coefficient regularization. As shown in Table <tblr tid="T2">2</tblr>, the notations used in this paper are interpreted in the dual space and are equivalent to regularization of kernel coefficients in the primal space. The advantage of dual space interpretation is that we can easily extend the analogue solution to various machine learning algorithms. To keep the discussion concise, we will from now on mainly focus on comparing the <it>L</it>
<sub>&#8734;</sub>, <it>L</it>
<sub>1 </sub>and <it>L</it>
<sub>2 </sub>in the dual problems and present the solutions in the dual space.</p>
<tbl id="T2"><title><p>Table 2</p></title><caption><p>The notation used in this paper is based on the dual problem and can be linked to a equivalent notation in the primal problem</p></caption><tblbdy cols="3">
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>
               <b>primal problem</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>dual problem</b>
            </p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>variable</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>
                  <it>&#952;</it>
               </b>
               <sub>
                  <b>
                     <it>j</it>
                  </b>
               </sub>
            </p>
         </c>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i53.gif"/>
               </inline-formula>
            </p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>L</b>
               <sub>&#8734;</sub>
            </p>
         </c>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i54.gif"/>
               </inline-formula>
            </p>
         </c>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i55.gif"/>
               </inline-formula>
            </p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>L</b>
               <sub>
                  <b>1</b>
               </sub>
            </p>
         </c>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i56.gif"/>
               </inline-formula>
            </p>
         </c>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i57.gif"/>
               </inline-formula>
            </p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>L</b>
               <sub>
                  <b>2</b>
               </sub>
            </p>
         </c>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i58.gif"/>
               </inline-formula>
            </p>
         </c>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i59.gif"/>
               </inline-formula>
            </p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>L</b>
               <sub>
                  <b>1.5</b>
               </sub>
            </p>
         </c>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i60.gif"/>
               </inline-formula>
            </p>
         </c>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i61.gif"/>
               </inline-formula>
            </p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>L</b>
               <sub>
                  <b>1.3333</b>
               </sub>
            </p>
         </c>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i62.gif"/>
               </inline-formula>
            </p>
         </c>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i63.gif"/>
               </inline-formula>
            </p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>L</b>
               <sub>
                  <b>1.25</b>
               </sub>
            </p>
         </c>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i64.gif"/>
               </inline-formula>
            </p>
         </c>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i65.gif"/>
               </inline-formula>
            </p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>L</b>
               <sub>
                  <b>1.2</b>
               </sub>
            </p>
         </c>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i66.gif"/>
               </inline-formula>
            </p>
         </c>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i67.gif"/>
               </inline-formula>
            </p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>L</b>
               <sub>
                  <b>1.1667</b>
               </sub>
            </p>
         </c>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i68.gif"/>
               </inline-formula>
            </p>
         </c>
         <c ca="left">
            <p>
               <inline-formula>
                  <graphic file="1471-2105-11-309-i69.gif"/>
               </inline-formula>
            </p>
         </c>
      </r>
   </tblbdy></tbl>
<p>Next, we will investigate several concrete kernel fusion algorithms and will propose the corresponding <it>L</it>
<sub>2 </sub>solutions.</p>
</sec>
<sec>
<st>
<p>One class SVM kernel fusion for ranking</p>
</st>
<p>The primal problem of one class SVM (1-SVM) is defined by Tax and Duin <abbrgrp>
<abbr bid="B12">12</abbr>
</abbrgrp> and Sch&#246;lkopf <it>et al</it>. <abbrgrp>
<abbr bid="B13">13</abbr>
</abbrgrp> as</p>
<p>
<display-formula id="M18">
<graphic file="1471-2105-11-309-i70.gif"/>
</display-formula>
</p>
<p>where <inline-formula>
<graphic file="1471-2105-11-309-i6.gif"/>
</inline-formula> is the norm vector of the separating hyperplane, <inline-formula>
<graphic file="1471-2105-11-309-i7.gif"/>
</inline-formula> are the training samples, <it>&#957; </it>is the regularization constant penalizing outliers in the training samples, <it>&#981;</it>(&#183;) denotes the feature map, <it>&#961; </it>is a bias term, <it>&#958;</it>
<sub>
<it>i </it>
</sub>are slack variables, and <it>N </it>is the number of training samples. Taking the conditions for optimality from the Lagrangian, one obtains the dual problem, given by:</p>
<p>
<display-formula id="M19">
<graphic file="1471-2105-11-309-i71.gif"/>
</display-formula>
</p>
<p>where <it>&#945;</it>
<sub>
<it>i </it>
</sub>are dual variables, <it>K </it>represents the kernel matrix obtained by the inner product between any pair of samples specified by a kernel function <inline-formula>
<graphic file="1471-2105-11-309-i72.gif"/>
</inline-formula>. To incorporate multiple kernels in (19), De Bie <it>et al</it>. proposed a solution <abbrgrp>
<abbr bid="B7">7</abbr>
</abbrgrp> with the dual problem formulated as</p>
<p>
<display-formula id="M20">
<graphic file="1471-2105-11-309-i73.gif"/>
</display-formula>
</p>
<p>where <it>p </it>is the number of data sources and <it>K</it>
<sub>
<it>j </it>
</sub>is the <it>j</it>-th kernel matrix. The formulation exactly corresponds to the <it>L</it>
<sub>&#8734; </sub>solution of the problem defined in the previous section (the PSD constraint is implied in the kernel matrix) with additional constraints imposed on <inline-formula>
<graphic file="1471-2105-11-309-i1.gif"/>
</inline-formula>. The optimal coefficients <it>&#952;</it>
<sub>
<it>j </it>
</sub>are used to combine multiple kernels as</p>
<p>
<display-formula id="M21">
<graphic file="1471-2105-11-309-i74.gif"/>
</display-formula>
</p>
<p>and the ranking function is given by</p>
<p>
<display-formula id="M22">
<graphic file="1471-2105-11-309-i75.gif"/>
</display-formula>
</p>
<p>where &#937;<sub>
<it>N </it>
</sub>is the combined kernel of training data <inline-formula>
<graphic file="1471-2105-11-309-i7.gif"/>
</inline-formula>, <it>i </it>= 1, ..., <it>N</it>, <inline-formula>
<graphic file="1471-2105-11-309-i10.gif"/>
</inline-formula> is the test data point to be ranked, <inline-formula>
<graphic file="1471-2105-11-309-i76.gif"/>
</inline-formula> is the kernel function applied on test data and training data, <inline-formula>
<graphic file="1471-2105-11-309-i1.gif"/>
</inline-formula> is the dual variable solved as (20). De Bie <it>et al</it>. applied the method in the application of disease gene prioritization, where multiple genomic data sources are combined to rank a large set of test genes using the 1-SVM model trained from a small set of training genes which are known to be relevant for certain diseases. The <it>L</it>
<sub>&#8734; </sub>formulation in their approach yields a sparse solution when integrating genomic data sources (see Figure 2 of <abbrgrp>
<abbr bid="B7">7</abbr>
</abbrgrp>). To avoid this disadvantage, they proposed a regularization method by restricting the minimal boundary on the kernel coefficients, notated as <it>&#952;</it>
<sub>
<it>min</it>
</sub>, to ensure the minimal contribution of each genomic data source to be <it>&#952;</it>
<sub>
<it>min</it>
</sub>/<it>p</it>. According to their experiments, the regularized solution performed best, being significantly better than the sparse integration and the average combination of kernels.</p>
<p>Instead of setting the ad hoc parameter <it>&#952;</it>
<sub>
<it>min</it>
</sub>, one can also straightforwardly propose an <it>L</it>
<sub>2</sub>-norm approach to solve the identical problem, given by</p>
<p>
<display-formula id="M23">
<graphic file="1471-2105-11-309-i77.gif"/>
</display-formula>
</p>
<p>where <inline-formula>
<graphic file="1471-2105-11-309-i78.gif"/>
</inline-formula>. The problem above is a QCLP problem and can be solved by conic optimization solvers such as Sedumi <abbrgrp>
<abbr bid="B14">14</abbr>
</abbrgrp>. In (23), the first constraint represents a Lorentz cone and the second constraint corresponds to <it>p </it>number of rotated Lorentz cones (R cones). The optimal kernel coefficients <it>&#952;</it>
<sub>
<it>j </it>
</sub>correspond to the dual variables of the R cones with ||<it>&#952;</it>||<sub>2 </sub>= 1. In this <it>L</it>
<sub>2</sub>-norm approach, the integrated kernel &#937; is combined by different <inline-formula>
<graphic file="1471-2105-11-309-i79.gif"/>
</inline-formula> and the same scoring function as in (22) is applied on the different solutions of <inline-formula>
<graphic file="1471-2105-11-309-i1.gif"/>
</inline-formula> and &#937;.</p>
</sec>
<sec>
<st>
<p>Support vector machine MKL for classification</p>
</st>
<p>The notion of MKL is originally proposed in a binary SVM classification, where the primal objective is given by</p>
<p>
<display-formula id="M24">
<graphic file="1471-2105-11-309-i80.gif"/>
</display-formula>
</p>
<p>where <inline-formula>
<graphic file="1471-2105-11-309-i7.gif"/>
</inline-formula> are data samples, <it>&#981;</it>(&#183;) is the feature map, <it>y</it>
<sub>
<it>i </it>
</sub>are class labels, <it>C </it>&gt; 0 is a positive regularization parameter, <it>&#958;</it>
<sub>
<it>i </it>
</sub>are slack variables, <inline-formula>
<graphic file="1471-2105-11-309-i6.gif"/>
</inline-formula> is the norm vector of the separating hyperplane, and <it>b </it>is the bias. This problem is convex and can be solved as a dual problem, given by</p>
<p>
<display-formula id="M25">
<graphic file="1471-2105-11-309-i81.gif"/>
</display-formula>
</p>
<p>where <inline-formula>
<graphic file="1471-2105-11-309-i1.gif"/>
</inline-formula> are the dual variables, <it>Y </it>= <it>diag</it>(<it>y</it>
<sub>1</sub>, ..., <it>y</it>
<sub>
<it>N</it>
</sub>), <it>K </it>is the kernel matrix, and <it>C </it>is the upperbound of the box constraint on the dual variables. To incorporate multiple kernels in (25), Lanckriet <it>et al</it>. <abbrgrp>
<abbr bid="B6">6</abbr>
<abbr bid="B4">4</abbr>
</abbrgrp> and Bach <it>et al</it>. <abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp> proposed a multiple kernel learning (MKL) problem as follows:</p>
<p>
<display-formula id="M26">
<graphic file="1471-2105-11-309-i82.gif"/>
</display-formula>
</p>
<p>where <it>p </it>is the number of kernels. (26) optimizes the <it>L</it>
<sub>&#8734;</sub>-norm of the set of kernel quadratic terms. Based on the previous discussions, the <it>L</it>
<sub>2</sub>-norm solution is analogously given by</p>
<p>
<display-formula id="M27">
<graphic file="1471-2105-11-309-i83.gif"/>
</display-formula>
</p>
<p>where <inline-formula>
<graphic file="1471-2105-11-309-i84.gif"/>
</inline-formula>. Both formulations in (26) and (27) can be efficiently solved as second order cone programming (SOCP) problems by a conic optimization solver (i.e., Sedumi <abbrgrp>
<abbr bid="B14">14</abbr>
</abbrgrp>) or as QCQP problems by a general QP solver (i.e., MOSEK <abbrgrp>
<abbr bid="B15">15</abbr>
</abbrgrp>). It is also known that a binary MKL problem can also be formulated as Semi-definite Programming (SDP), as proposed by Lanckriet <it>et al</it>. <abbrgrp>
<abbr bid="B4">4</abbr>
</abbrgrp> and Kim <it>et al</it>. <abbrgrp>
<abbr bid="B16">16</abbr>
</abbrgrp>. However, in a multi-class problem, SDP problems are computationally prohibitive due to the presence of PSD constraints and can only be solved approximately by relaxation <abbrgrp>
<abbr bid="B17">17</abbr>
</abbrgrp>. On the contrary, the QCLP and QCQP formulations of binary classification problems can be easily extended to a multi-class setting using the one-versus-all (1vsA) coding, i.e., solving the problem of <it>k </it>classes as <it>k </it>number of binary problems. Therefore, the <it>L</it>
<sub>&#8734; </sub>multi-class SVM MKL is then formulated as</p>
<p>
<display-formula id="M28">
<graphic file="1471-2105-11-309-i85.gif"/>
</display-formula>
</p>
<p>The <it>L</it>
<sub>2 </sub>multi-class SVM MKL is given by</p>
<p>
<display-formula id="M29">
<graphic file="1471-2105-11-309-i86.gif"/>
</display-formula>
</p>
<p>where</p>
<p>
<display-formula>
<graphic file="1471-2105-11-309-i87.gif"/>
</display-formula>
</p>
<sec>
<st>
<p>SIP formulation for SVM MKL on larger scale data</p>
</st>
<p>Unfortunately, the kernel fusion problem becomes challenging on large scale data because it may scale up in three dimensions: the number of data points, the number of classes, and the number of kernels. When these dimensions are all large, memory issues may arise as the kernel matrices need to be stored in memory. Though it is feasible to approximate the kernel matrices by a low rank decomposition (i.e., incomplete Cholesky decomposition) and to reduce the computational burden of conic optimization using these low rank matrices, conic problems involve a large amount of variables and constraints and it is usually less efficient than QCQP. Moreover, the precision of the low rank approximation relies on the assumption that the eigenvalues of kernel matrices decay rapidly, which may not always be true when the intrinsic dimensions of the kernels are large. To tackle the computational burden of MKL, Sonnenburg <it>et al</it>. reformulated the QP problem as semi-infinite programming (SIP) and approximated the QP solution using a bi-level strategy (wrapper method) <abbrgrp>
<abbr bid="B18">18</abbr>
</abbrgrp>. The standard form of SIP is given by</p>
<p>
<display-formula id="M30">
<graphic file="1471-2105-11-309-i88.gif"/>
</display-formula>
</p>
<p>where the constraint functions in <inline-formula>
<graphic file="1471-2105-11-309-i89.gif"/>
</inline-formula> can be either linear or quadratic and there are infinite number of them in &#8704;<it>t </it>&#8712; &#978;. To solve it, a <it>discretization </it>method is usually applied, which is briefly summarized as follows <abbrgrp>
<abbr bid="B19">19</abbr>
<abbr bid="B20">20</abbr>
<abbr bid="B21">21</abbr>
</abbrgrp>:</p>
<p indent="1">1. Choose a finite subset <inline-formula>
<graphic file="1471-2105-11-309-i90.gif"/>
</inline-formula> &#8834; &#978;.</p>
<p indent="1">2. Solve the convex programming problem</p>
<p>
<display-formula id="M31">
<graphic file="1471-2105-11-309-i91.gif"/>
</display-formula>
</p>
<p>
<display-formula id="M32">
<graphic file="1471-2105-11-309-i92.gif"/>
</display-formula>
</p>
<p indent="1">3. If the solution of 2 is not satisfactorily close to the original problem then choose a larger, but still finite subset <inline-formula>
<graphic file="1471-2105-11-309-i90.gif"/>
</inline-formula> and repeat from Step 2.</p>
<p>The convergence of SIP and the accuracy of the discretization method have been extensively described (see <abbrgrp>
<abbr bid="B19">19</abbr>
<abbr bid="B20">20</abbr>
<abbr bid="B21">21</abbr>
</abbrgrp>). As proposed by Sonnenburg <it>et al</it>. <abbrgrp>
<abbr bid="B18">18</abbr>
</abbrgrp>, the multi-class SVM MKL objective in (26) can be formulated as a SIP problem, given by</p>
<p>
<display-formula id="M33">
<graphic file="1471-2105-11-309-i93.gif"/>
</display-formula>
</p>
<p>The SIP problem above is solved as a bi-level algorithm for which the pseudo code is presented in Algorithm 1 in the Appendix. In each loop <it>&#964;</it>, Step 1 optimizes <inline-formula>
<graphic file="1471-2105-11-309-i94.gif"/>
</inline-formula> and <it>u</it>
<sup>(<it>&#964;</it>) </sup>for a restricted subset of constraints as a linear programming. Step 3 is an SVM problem with a single kernel and generates a new <inline-formula>
<graphic file="1471-2105-11-309-i95.gif"/>
</inline-formula>. If <inline-formula>
<graphic file="1471-2105-11-309-i95.gif"/>
</inline-formula> is not satisfied by the current <inline-formula>
<graphic file="1471-2105-11-309-i94.gif"/>
</inline-formula> and <it>u</it>
<sup>(<it>&#964;</it>)</sup>, it will be added successively to step 1 until all constraints are satisfied. The starting points <inline-formula>
<graphic file="1471-2105-11-309-i96.gif"/>
</inline-formula> are randomly initialized and SIP always converges to a identical result.</p>
<p>Algorithm 1 is also applicable to the <it>L</it>
<sub>2</sub>-norm situation of SVM MKL, whereas the non-convex constraint <inline-formula>
<graphic file="1471-2105-11-309-i58.gif"/>
</inline-formula> in Step 1 needs to be relaxed as <inline-formula>
<graphic file="1471-2105-11-309-i97.gif"/>
</inline-formula>, and the <it>f</it>
<sub>
<it>j</it>
</sub>(<inline-formula>
<graphic file="1471-2105-11-309-i1.gif"/>
</inline-formula>) term in (32) is modified as only containing the quadratic term. The SIP formulation for <it>L</it>
<sub>2</sub>-norm SVM MKL is given by</p>
<p>
<display-formula id="M34">
<graphic file="1471-2105-11-309-i98.gif"/>
</display-formula>
</p>
<p>With these modifications, Step 1 of Algorithm 1 becomes a QCLP problem given by</p>
<p>
<display-formula id="M35">
<graphic file="1471-2105-11-309-i99.gif"/>
</display-formula>
</p>
<p>where <inline-formula>
<graphic file="1471-2105-11-309-i100.gif"/>
</inline-formula> and <inline-formula>
<graphic file="1471-2105-11-309-i1.gif"/>
</inline-formula> is a given value. Moreover, the PSD property of kernel matrices ensures that <it>A</it>
<sub>
<it>j </it>
</sub>&#8805; 0, thus the optimal solution always satisfies <inline-formula>
<graphic file="1471-2105-11-309-i58.gif"/>
</inline-formula>.</p>
<p>In the SIP formulation, the SVM MKL is solved iteratively as two components. The first component is a single kernel SVM, which is solved more efficiently when the data scale is larger then thousands of data points (and smaller than ten thousands) and, requires much less memory than the QP formulation. The second component is a small scale problem, which is a linear problem in <it>L</it>
<sub>&#8734; </sub>case and a QCLP problem in the <it>L</it>
<sub>2 </sub>approach. As shown, the complexity of the SIP based SVM MKL is mainly determined by the burden of a single kernel SVM multiplied by the number of iterations. This has inspired us to adopt more efficient single SVM learning algorithms to further improve the efficiency. The least squares support vector machines (LSSVM) <abbrgrp>
<abbr bid="B22">22</abbr>
</abbrgrp> is known for its simple differentiable cost function, the equality constraints in the separating hyperplane and its solution based on linear equations, which is preferable for large scaler problems. Next, we will investigate the MKL solutions issue using LSSVM formulations.</p>
</sec>
</sec>
<sec>
<st>
<p>Least squares SVM MKL for classification</p>
</st>
<p>In LSSVM, the primal problem is given by <abbrgrp>
<abbr bid="B22">22</abbr>
</abbrgrp>
</p>
<p>
<display-formula id="M36">
<graphic file="1471-2105-11-309-i101.gif"/>
</display-formula>
</p>
<p>where most of the variables are defined in a similar way as in (24). The main difference is that the nonnegative slack variable <it>&#958; </it>is replaced by a squared error term <inline-formula>
<graphic file="1471-2105-11-309-i102.gif"/>
</inline-formula> and the inequality constraints are modified as equality ones. Taking the conditions for optimality from the Lagrangian, eliminating <inline-formula>
<graphic file="1471-2105-11-309-i103.gif"/>
</inline-formula>, defining <inline-formula>
<graphic file="1471-2105-11-309-i104.gif"/>
</inline-formula> = [<it>y</it>
<sub>1</sub>, ..., <it>y</it>
<sub>
<it>N</it>
</sub>] and <it>Y </it>= <it>diag</it>(<it>y</it>
<sub>1</sub>, ..., <it>y</it>
<sub>
<it>N</it>
</sub>), one obtains the following linear system <abbrgrp>
<abbr bid="B22">22</abbr>
</abbrgrp>:</p>
<p>
<display-formula id="M37">
<graphic file="1471-2105-11-309-i105.gif"/>
</display-formula>
</p>
<p>where <inline-formula>
<graphic file="1471-2105-11-309-i1.gif"/>
</inline-formula> are unconstrained dual variables. Without the loss of generality, we denote <inline-formula>
<graphic file="1471-2105-11-309-i18.gif"/>
</inline-formula> and rewrite (37) as</p>
<p>
<display-formula id="M38">
<graphic file="1471-2105-11-309-i106.gif"/>
</display-formula>
</p>
<p>In (38), we add an additional constraint as <it>Y</it>
<sup>-2 </sup>= <it>I </it>then the coefficient becomes a static value in the multi-class case. In 1vsA coding, (37) requires to solve <it>k </it>number of linear problems whereas in (38), the coefficient matrix is only factorized once such that the solution of <inline-formula>
<graphic file="1471-2105-11-309-i107.gif"/>
</inline-formula> w.r.t. the multi-class label vectors <inline-formula>
<graphic file="1471-2105-11-309-i108.gif"/>
</inline-formula> is very efficient to obtain. The constraint <it>Y</it>
<sup>-2 </sup>= <it>I </it>can be simply satisfied by assuming the class labels to be -1 and +1. Thus, from now on, we assume <it>Y</it>
<sup>-2 </sup>= <it>I </it>in the following discussion.</p>
<p>To incorporate multiple kernels in LSSVM classification, the <it>L</it>
<sub>&#8734;</sub>-norm approach is a QP problem, given by (assuming <it>Y</it>
<sup>-2 </sup>= <it>I</it>)</p>
<p>
<display-formula id="M39">
<graphic file="1471-2105-11-309-i109.gif"/>
</display-formula>
</p>
<p>The <it>L</it>
<sub>2</sub>-norm approach is analogously formulated as</p>
<p>
<display-formula id="M40">
<graphic file="1471-2105-11-309-i110.gif"/>
</display-formula>
</p>
<p>where <inline-formula>
<graphic file="1471-2105-11-309-i111.gif"/>
</inline-formula>. The &#955; parameter regularizes the squared error term in the primal objective in (36) and the quadratic term <inline-formula>
<graphic file="1471-2105-11-309-i112.gif"/>
</inline-formula> in the dual problem. Usually, the optimal &#955; needs to be selected empirically by cross-validation. In the kernel fusion of LSSVM, we can alternatively transform the effect of regularization as an identity kernel matrix in <inline-formula>
<graphic file="1471-2105-11-309-i113.gif"/>
</inline-formula>, where <it>&#952;</it>
<sub>
<it>p </it>+ 1 </sub>= 1/&#955;. Then the MKL problem of combining <it>p </it>kernels is equivalent to combining <it>p </it>+ 1 kernels where the last kernel is an identity matrix with the optimal coefficient corresponding to the &#955; value. This method has been mentioned by Lanckriet <it>et al</it>. to tackle the estimation of the regularization parameter in the soft margin SVM <abbrgrp>
<abbr bid="B4">4</abbr>
</abbrgrp>. It has also been used by Ye <it>et al</it>. to jointly estimate the optimal kernel for discriminant analysis <abbrgrp>
<abbr bid="B17">17</abbr>
</abbrgrp>. Saving the effort of validating &#955; may significantly reduce the model selection cost in complicated learning problems. By this transformation, the objective of LSSVM MKL becomes similar to that of SVM MKL with the main difference that the dual variables are unconstrained. Though (39) and (40) can in principle both be solved as QP problems by a conic solver or a QP solver, the efficiency of a linear solution of the LSSVM is lost. Fortunately, in a SIP formulation, the LSSVM MKL can be decomposed into iterations of the master problem of single kernel LSSVM learning, which is an unconstrained QP problem, and a coefficient optimization problem with very small scale.</p>
<sec>
<st>
<p>SIP formulation for LSSVM SVM MKL on larger scale data</p>
</st>
<p>The <it>L</it>
<sub>&#8734;</sub>-norm approach of multi-class LSSVM MKL is formulated as</p>
<p>
<display-formula id="M41">
<graphic file="1471-2105-11-309-i114.gif"/>
</display-formula>
</p>
<p>In the formulation above, <it>K</it>
<sub>
<it>j </it>
</sub>represents the <it>j </it>--th kernel matrix in a set of <it>p </it>+ 1 kernels with the <it>p </it>+ 1-th kernel being the identity matrix. The <it>L</it>
<sub>2</sub>-norm LSSVM MKL is formulated as</p>
<p>
<display-formula id="M42">
<graphic file="1471-2105-11-309-i115.gif"/>
</display-formula>
</p>
<p>The pseudocode of <it>L</it>
<sub>&#8734; </sub>-norm and <it>L</it>
<sub>2</sub>-norm LSSVM MKL is presented in Algorithm 2 in the Appendix. In <it>L</it>
<sub>&#8734; </sub>approach, Step 1 optimizes <inline-formula>
<graphic file="1471-2105-11-309-i43.gif"/>
</inline-formula> as a linear programming. In <it>L</it>
<sub>2 </sub>approach, Step 1 optimizes <inline-formula>
<graphic file="1471-2105-11-309-i43.gif"/>
</inline-formula> as a QCLP problem. Since the regularization coefficient is automatically estimated as <it>&#952;</it>
<sub>
<it>p </it>+ 1</sub>, Step 3 simplifies to a linear problem as</p>
<p>
<display-formula id="M43">
<graphic file="1471-2105-11-309-i116.gif"/>
</display-formula>
</p>
<p>where <inline-formula>
<graphic file="1471-2105-11-309-i117.gif"/>
</inline-formula>.</p>
</sec>
</sec>
<sec>
<st>
<p>Summary of algorithms</p>
</st>
<p>As discussed, the dual <it>L</it>
<sub>2 </sub>MKL solution can be extended to many machine learning problems. In principle, all MKL algorithms can be formulated in <it>L</it>
<sub>&#8734;</sub>, <it>L</it>
<sub>1</sub>, and <it>L</it>
<sub>2 </sub>forms and lead to different solutions. To validate the proposed approach, we implemented and compared 20 algorithms on various data sets. The summary of all implemented algorithms is presented in Table <tblr tid="T3">3</tblr>. These algorithms combine <it>L</it>
<sub>
<it>&#8734;</it>
</sub>, <it>L</it>
<sub>1</sub>, and <it>L</it>
<sub>2 </sub>MKL with 1-SVM, SVM, and LSSVM. Moreover, to cope with imbalanced data in classification, we also extended Weighted SVM <abbrgrp>
<abbr bid="B23">23</abbr>
<abbr bid="B24">24</abbr>
</abbrgrp> and Weighted LSSVM <abbrgrp>
<abbr bid="B25">25</abbr>
<abbr bid="B26">26</abbr>
</abbrgrp> to their MKL formulations (presented in Additional file <supplr sid="S1">1</supplr>). Though we mainly focus on <it>L</it>
<sub>&#8734;</sub>, <it>L</it>
<sub>1</sub>, and <it>L</it>
<sub>2 </sub>MKL methods, we also implement the <it>L</it>
<sub>
<it>n</it>
</sub>-norm MKL for 1-SVM, SVM, LS-SVM and Weighted SVM. These algorithms are applied on the four biomedical experimental data sets and the performance is reported in section 8 of Additional file <supplr sid="S1">1</supplr>. Moreover, the <it>L</it>
<sub>
<it>n</it>
</sub>-norm algorithms are also available on the website of this paper.</p>
<tbl id="T3"><title><p>Table 3</p></title><caption><p>Summary of algorithms implemented in the paper</p></caption><tblbdy cols="6">
      <r>
         <c ca="left">
            <p>
               <b>Algorithm Nr.</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Formulation Nr.</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Name</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>References</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Formulation</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Equations</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>1</p>
         </c>
         <c ca="left">
            <p>1-A</p>
         </c>
         <c ca="left">
            <p>1-SVM <it>L</it><sub>&#8734; </sub>MKL</p>
         </c>
         <c ca="left">
            <p>
               <abbrgrp>
                  <abbr bid="B7">7</abbr>
               </abbrgrp>
            </p>
         </c>
         <c ca="left">
            <p>SOCP</p>
         </c>
         <c ca="left">
            <p>(20)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>1</p>
         </c>
         <c ca="left">
            <p>1-B</p>
         </c>
         <c ca="left">
            <p>1-SVM <it>L</it><sub>&#8734; </sub>MKL</p>
         </c>
         <c ca="left">
            <p>
               <abbrgrp>
                  <abbr bid="B7">7</abbr>
               </abbrgrp>
            </p>
         </c>
         <c ca="left">
            <p>QCQP</p>
         </c>
         <c ca="left">
            <p>(20)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>2</p>
         </c>
         <c ca="left">
            <p>2-A</p>
         </c>
         <c ca="left">
            <p>1-SVM <it>L</it><sub>&#8734; </sub>(0.5) MKL</p>
         </c>
         <c ca="left">
            <p>
               <abbrgrp>
                  <abbr bid="B7">7</abbr>
               </abbrgrp>
            </p>
         </c>
         <c ca="left">
            <p>SOCP</p>
         </c>
         <c ca="left">
            <p>(20)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>2</p>
         </c>
         <c ca="left">
            <p>2-B</p>
         </c>
         <c ca="left">
            <p>1-SVM <it>L</it><sub>&#8734; </sub>(0.5) MKL</p>
         </c>
         <c ca="left">
            <p>
               <abbrgrp>
                  <abbr bid="B7">7</abbr>
               </abbrgrp>
            </p>
         </c>
         <c ca="left">
            <p>QCQP</p>
         </c>
         <c ca="left">
            <p>(20)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>3</p>
         </c>
         <c ca="left">
            <p>3-A</p>
         </c>
         <c ca="left">
            <p>1-SVM <it>L</it><sub>1 </sub>MKL</p>
         </c>
         <c ca="left">
            <p>
               <abbrgrp>
                  <abbr bid="B12">12</abbr>
                  <abbr bid="B13">13</abbr>
               </abbrgrp>
            </p>
         </c>
         <c ca="left">
            <p>SOCP</p>
         </c>
         <c ca="left">
            <p>(19)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>3</p>
         </c>
         <c ca="left">
            <p>3-B</p>
         </c>
         <c ca="left">
            <p>1-SVM <it>L</it><sub>1 </sub>MKL</p>
         </c>
         <c ca="left">
            <p>
               <abbrgrp>
                  <abbr bid="B12">12</abbr>
                  <abbr bid="B13">13</abbr>
               </abbrgrp>
            </p>
         </c>
         <c ca="left">
            <p>QCQP</p>
         </c>
         <c ca="left">
            <p>(19)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>4</p>
         </c>
         <c ca="left">
            <p>4-A</p>
         </c>
         <c ca="left">
            <p>1-SVM <it>L</it><sub>2 </sub>MKL</p>
         </c>
         <c ca="left">
            <p>novel</p>
         </c>
         <c ca="left">
            <p>SOCP</p>
         </c>
         <c ca="left">
            <p>(23)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>5</p>
         </c>
         <c ca="left">
            <p>5-B</p>
         </c>
         <c ca="left">
            <p>SVM <it>L</it><sub>&#8734; </sub>MKL</p>
         </c>
         <c ca="left">
            <p>
               <abbrgrp>
                  <abbr bid="B4">4</abbr>
                  <abbr bid="B6">6</abbr>
                  <abbr bid="B5">5</abbr>
               </abbrgrp>
            </p>
         </c>
         <c ca="left">
            <p>QCQP</p>
         </c>
         <c ca="left">
            <p>(26)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>5</p>
         </c>
         <c ca="left">
            <p>5-C</p>
         </c>
         <c ca="left">
            <p>SVM <it>L</it><sub>&#8734; </sub>MKL</p>
         </c>
         <c ca="left">
            <p>
               <abbrgrp>
                  <abbr bid="B18">18</abbr>
               </abbrgrp>
            </p>
         </c>
         <c ca="left">
            <p>SIP</p>
         </c>
         <c ca="left">
            <p>(33)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>6</p>
         </c>
         <c ca="left">
            <p>6-B</p>
         </c>
         <c ca="left">
            <p>SVM <it>L</it><sub>&#8734; </sub>(0.5) MKL</p>
         </c>
         <c ca="left">
            <p>novel</p>
         </c>
         <c ca="left">
            <p>QCQP</p>
         </c>
         <c ca="left">
            <p>(26)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>7</p>
         </c>
         <c ca="left">
            <p>7-A</p>
         </c>
         <c ca="left">
            <p>SVM <it>L</it><sub>1 </sub>MKL</p>
         </c>
         <c ca="left">
            <p>
               <abbrgrp>
                  <abbr bid="B2">2</abbr>
               </abbrgrp>
            </p>
         </c>
         <c ca="left">
            <p>SOCP</p>
         </c>
         <c ca="left">
            <p>(25)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>7</p>
         </c>
         <c ca="left">
            <p>7-B</p>
         </c>
         <c ca="left">
            <p>SVM <it>L</it><sub>1 </sub>MKL</p>
         </c>
         <c ca="left">
            <p>
               <abbrgrp>
                  <abbr bid="B4">4</abbr>
               </abbrgrp>
            </p>
         </c>
         <c ca="left">
            <p>QCQP</p>
         </c>
         <c ca="left">
            <p>(25)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>8</p>
         </c>
         <c ca="left">
            <p>8-A</p>
         </c>
         <c ca="left">
            <p>SVM <it>L</it><sub>2 </sub>MKL</p>
         </c>
         <c ca="left">
            <p>novel</p>
         </c>
         <c ca="left">
            <p>SOCP</p>
         </c>
         <c ca="left">
            <p>(27)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>8</p>
         </c>
         <c ca="left">
            <p>8-C</p>
         </c>
         <c ca="left">
            <p>SVM <it>L</it><sub>2 </sub>MKL</p>
         </c>
         <c ca="left">
            <p>
               <abbrgrp>
                  <abbr bid="B40">40</abbr>
               </abbrgrp>
            </p>
         </c>
         <c ca="left">
            <p>SIP</p>
         </c>
         <c ca="left">
            <p>(34)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>9</p>
         </c>
         <c ca="left">
            <p>9-B</p>
         </c>
         <c ca="left">
            <p>Weighted SVM <it>L</it><sub>&#8734; </sub>MKL</p>
         </c>
         <c ca="left">
            <p>novel</p>
         </c>
         <c ca="left">
            <p>QCQP</p>
         </c>
         <c ca="left">
            <p>Suppl. (3)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>10</p>
         </c>
         <c ca="left">
            <p>10-B</p>
         </c>
         <c ca="left">
            <p>Weighted SVM <it>L</it><sub>&#8734; </sub>(0.5) MKL</p>
         </c>
         <c ca="left">
            <p>novel</p>
         </c>
         <c ca="left">
            <p>QCQP</p>
         </c>
         <c ca="left">
            <p>Suppl. (3)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>11</p>
         </c>
         <c ca="left">
            <p>11-B</p>
         </c>
         <c ca="left">
            <p>Weighted SVM <it>L</it><sub>1 </sub>MKL</p>
         </c>
         <c ca="left">
            <p>
               <abbrgrp>
                  <abbr bid="B25">25</abbr>
               </abbrgrp>
            </p>
         </c>
         <c ca="left">
            <p>QCQP</p>
         </c>
         <c ca="left">
            <p>Suppl. (2)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>12</p>
         </c>
         <c ca="left">
            <p>12-A</p>
         </c>
         <c ca="left">
            <p>Weighted SVM <it>L</it><sub>2 </sub>MKL</p>
         </c>
         <c ca="left">
            <p>novel</p>
         </c>
         <c ca="left">
            <p>SOCP</p>
         </c>
         <c ca="left">
            <p>Suppl. (4)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>13</p>
         </c>
         <c ca="left">
            <p>13-B</p>
         </c>
         <c ca="left">
            <p>LSSVM <it>L</it><sub>&#8734; </sub>MKL</p>
         </c>
         <c ca="left">
            <p>
               <abbrgrp>
                  <abbr bid="B17">17</abbr>
               </abbrgrp>
            </p>
         </c>
         <c ca="left">
            <p>QCQP</p>
         </c>
         <c ca="left">
            <p>(39)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>13</p>
         </c>
         <c ca="left">
            <p>13-C</p>
         </c>
         <c ca="left">
            <p>LSSVM <it>L</it><sub>&#8734; </sub>MKL</p>
         </c>
         <c ca="left">
            <p>
               <abbrgrp>
                  <abbr bid="B17">17</abbr>
               </abbrgrp>
            </p>
         </c>
         <c ca="left">
            <p>SIP</p>
         </c>
         <c ca="left">
            <p>(41)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>14</p>
         </c>
         <c ca="left">
            <p>14-B</p>
         </c>
         <c ca="left">
            <p>LSSVM <it>L</it><sub>&#8734; </sub>(0.5) MKL</p>
         </c>
         <c ca="left">
            <p>novel</p>
         </c>
         <c ca="left">
            <p>QCQP</p>
         </c>
         <c ca="left">
            <p>(39)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>15</p>
         </c>
         <c ca="left">
            <p>15-D</p>
         </c>
         <c ca="left">
            <p>LSSVM <it>L</it><sub>1 </sub>MKL</p>
         </c>
         <c ca="left">
            <p>
               <abbrgrp>
                  <abbr bid="B22">22</abbr>
               </abbrgrp>
            </p>
         </c>
         <c ca="left">
            <p>linear</p>
         </c>
         <c ca="left">
            <p>(38)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>16</p>
         </c>
         <c ca="left">
            <p>16-B</p>
         </c>
         <c ca="left">
            <p>LSSVM <it>L</it><sub>2 </sub>MKL</p>
         </c>
         <c ca="left">
            <p>novel</p>
         </c>
         <c ca="left">
            <p>SOCP</p>
         </c>
         <c ca="left">
            <p>(40)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>16</p>
         </c>
         <c ca="left">
            <p>16-C</p>
         </c>
         <c ca="left">
            <p>LSSVM <it>L</it><sub>2 </sub>MKL</p>
         </c>
         <c ca="left">
            <p>novel</p>
         </c>
         <c ca="left">
            <p>SIP</p>
         </c>
         <c ca="left">
            <p>(42)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>17</p>
         </c>
         <c ca="left">
            <p>17-B</p>
         </c>
         <c ca="left">
            <p>Weighted LSSVM <it>L</it><sub>&#8734; </sub>MKL</p>
         </c>
         <c ca="left">
            <p>novel</p>
         </c>
         <c ca="left">
            <p>QCQP</p>
         </c>
         <c ca="left">
            <p>Suppl. (8)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>18</p>
         </c>
         <c ca="left">
            <p>18-B</p>
         </c>
         <c ca="left">
            <p>Weighted LSSVM <it>L</it><sub>&#8734; </sub>(0.5) MKL</p>
         </c>
         <c ca="left">
            <p>novel</p>
         </c>
         <c ca="left">
            <p>QCQP</p>
         </c>
         <c ca="left">
            <p>Suppl. (8)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>19</p>
         </c>
         <c ca="left">
            <p>19-D</p>
         </c>
         <c ca="left">
            <p>Weighted LSSVM <it>L</it><sub>1 </sub>MKL</p>
         </c>
         <c ca="left">
            <p>
               <abbrgrp>
                  <abbr bid="B25">25</abbr>
               </abbrgrp>
            </p>
         </c>
         <c ca="left">
            <p>linear</p>
         </c>
         <c ca="left">
            <p>Suppl. (6)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>20</p>
         </c>
         <c ca="left">
            <p>20-A</p>
         </c>
         <c ca="left">
            <p>Weighted LSSVM <it>L</it><sub>2 </sub>MKL</p>
         </c>
         <c ca="left">
            <p>novel</p>
         </c>
         <c ca="left">
            <p>SOCP</p>
         </c>
         <c ca="left">
            <p>Suppl. (9)</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Summary of algorithms implemented in the paper. Because a same algorithm can be solved via different formulations. The different formulation numbers correspond to a same algorithm number and represent multiple formulations. In total 20 different algorithms were implemented, which were solved through 28 different formulations. For an algorithm with different formulations, the solutions are identical and only differ by computational efficiency. Some algorithms have already been proposed in the literature as shown in the reference column. The novel algorithms and formulations proposed in this paper are labeled as "novel".</p>
   </tblfn></tbl>
<suppl id="S1">
<title>
<p>Additional file 1</p>
</title>
<text>
<p>The supplementary material contains (1) Genomic data sources used in experiment 1 and 2; (2) MKL extensions for Weighted SVM and Weighted LSSVM; (3) Kernel functions used in the paper; (4) Optimal kernel coefficients and performance of individual data sources in prostate cancer genes prioritization; (5) Performance of individual kernels in experiment 4; (6) Optimal weights assigned on each individual kernels in Experiment 4; (7) The effect of cost function regularization parameter &#955; of LSSVM in experiment 4; (8) Experimental results using MKL algorithms based on other norms.</p>
</text>
<file name="1471-2105-11-309-S1.PDF">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
<sec>
<st>
<p>Experimental setup and data sets</p>
</st>
<p>The performance of the proposed <it>L</it>
<sub>2 </sub>MKL method was systematically evaluated and compared on six real benchmark data sets. The computational efficiency was compared on two UCI data sets. On each data set, we compared the <it>L</it>
<sub>2 </sub>method with the <it>L</it>
<sub>&#8734;</sub>, <it>L</it>
<sub>1 </sub>and regularized <it>L</it>
<sub>&#8734; </sub>MKL method. In the regularized <it>L</it>
<sub>&#8734;</sub>, we set the minimal boundary of kernel coefficients <it>&#952;</it>
<sub>
<it>min </it>
</sub>to 0.5, denoted as <it>L</it>
<sub>&#8734; </sub>(0.5). We also compared the three different optimization formulations SOCP, QCQP and SIP on the UCI data sets. The experiments were categorized in five groups as summarized in Table <tblr tid="T4">4</tblr>.</p>
<tbl id="T4"><title><p>Table 4</p></title><caption><p>Summary of data sets and algorithms used in five experiments</p></caption><tblbdy cols="7">
      <r>
         <c ca="left">
            <p>
               <b>Nr.</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Data Set</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Problem</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Samples</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Classes</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Algorihtms</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Evaluation</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="7">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>1</p>
         </c>
         <c ca="left">
            <p>disease relevant genes</p>
         </c>
         <c ca="left">
            <p>ranking</p>
         </c>
         <c ca="left">
            <p>620</p>
         </c>
         <c ca="left">
            <p>1</p>
         </c>
         <c ca="left">
            <p>1-4</p>
         </c>
         <c ca="left">
            <p>LOO AUC</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>2</p>
         </c>
         <c ca="left">
            <p>prostate cancer genes</p>
         </c>
         <c ca="left">
            <p>ranking</p>
         </c>
         <c ca="left">
            <p>9</p>
         </c>
         <c ca="left">
            <p>1</p>
         </c>
         <c ca="left">
            <p>1-4</p>
         </c>
         <c ca="left">
            <p>AUC</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>3</p>
         </c>
         <c ca="left">
            <p>rectal cancer patients</p>
         </c>
         <c ca="left">
            <p>classification</p>
         </c>
         <c ca="left">
            <p>36</p>
         </c>
         <c ca="left">
            <p>2</p>
         </c>
         <c ca="left">
            <p>5-8,13-16</p>
         </c>
         <c ca="left">
            <p>LOO AUC</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>4</p>
         </c>
         <c ca="left">
            <p>endometrial disease</p>
         </c>
         <c ca="left">
            <p>classification</p>
         </c>
         <c ca="left">
            <p>339</p>
         </c>
         <c ca="left">
            <p>2</p>
         </c>
         <c ca="left">
            <p>5-8,13-16</p>
         </c>
         <c ca="left">
            <p>3-fold AUC</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>miscarriage</p>
         </c>
         <c ca="left">
            <p>classification</p>
         </c>
         <c ca="left">
            <p>2356</p>
         </c>
         <c ca="left">
            <p>2</p>
         </c>
         <c ca="left">
            <p>5-8,13-16</p>
         </c>
         <c ca="left">
            <p>3-fold AUC</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>pregnancy</p>
         </c>
         <c ca="left">
            <p>classification</p>
         </c>
         <c ca="left">
            <p>856</p>
         </c>
         <c ca="left">
            <p>2</p>
         </c>
         <c ca="left">
            <p>9-12,17-20</p>
         </c>
         <c ca="left">
            <p>3-fold AUC</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>5</p>
         </c>
         <c ca="left">
            <p>UCI pen digit and optical digit</p>
         </c>
         <c ca="left">
            <p>classification</p>
         </c>
         <c ca="left">
            <p>1000-3000</p>
         </c>
         <c ca="left">
            <p>10</p>
         </c>
         <c ca="left">
            <p>1A,1B,5B,5C,13B,13C</p>
         </c>
         <c ca="left">
            <p>CPU time</p>
         </c>
      </r>
   </tblbdy></tbl>
<sec>
<st>
<p>Experiment 1</p>
</st>
<p>In the first experiment, we demonstrated a disease gene prioritization application to compare the performance of optimizing different norms in MKL. The computational definition of gene prioritization is mentioned in our earlier work <abbrgrp>
<abbr bid="B7">7</abbr>
<abbr bid="B27">27</abbr>
<abbr bid="B28">28</abbr>
</abbrgrp>. In this paper, we applied four 1-SVM MKL algorithms to combine kernels derived from 9 heterogeneous genomic sources (shown in section 1 of Additional file <supplr sid="S1">1</supplr>) to prioritize 620 genes that are annotated to be relevant for 29 diseases in OMIM. The performance was evaluated by leave-one-out (LOO) validation: for each disease which contains <it>K </it>relevant genes, one gene, termed the "defector" gene, was removed from the set of training genes and added to 99 randomly selected test genes (test set). We used the remaining <it>K </it>- 1 genes (training set) to build our prioritization model. Then, we prioritized the test set of 100 genes with the trained model and determined the rank of that defector gene in test data. The prioritization function in (22) scored the relevant genes higher and others lower, thus, by labeling the "defector" gene as class "+1" and the random candidate genes as class "-1", we plotted the Receiver Operating Characteristic (ROC) curves to compare different models using the error of AUC (one minus the area under the ROC curve).</p>
<p>The kernels of data sources were all constructed using linear functions except the sequence data that was transformed into a kernel using a 2-mer string kernel function <abbrgrp>
<abbr bid="B29">29</abbr>
</abbrgrp> (details in section 1 of Additional file <supplr sid="S1">1</supplr>). In total 9 kernels were combined in this experiment. The regularization parameter <it>&#957; </it>in 1-SVM was set to 0.5 for all comparing algorithms. Since there was no hyper-parameter needed to be tuned in LOO validation, we reported the LOO results as the performance of generalization. For each disease relevant gene, the 99 test genes were randomly selected in each LOO validation run from the whole human protein-coding genome. We repeated the experiment 20 times and the mean value and standard deviation were used for comparison.</p>
</sec>
<sec>
<st>
<p>Experiment 2</p>
</st>
<p>In the second experiment we used the same data sources and kernel matrices as in the previous experiment to prioritize 9 prostate cancer genes recently discovered by Eeles <it>et al</it>. <abbrgrp>
<abbr bid="B30">30</abbr>
</abbrgrp>, Thomas <it>et al</it>. <abbrgrp>
<abbr bid="B31">31</abbr>
</abbrgrp> and Gudmundsson <it>et al</it>. <abbrgrp>
<abbr bid="B32">32</abbr>
</abbrgrp>. A training set of 14 known prostate cancer genes was compiled from the reference database OMIM including only the discoveries prior to January 2008. This training set was then used to train the prioritization model. For each novel prostate cancer gene, the test set contained the newly discovered gene plus its 99 closest neighbors on the chromosome. Besides the error of AUC, we also compared the ranking position of the novel prostate cancer gene among its 99 closet neighboring genes. Moreover, we compared the MKL results with the ones obtained via the Endeavour application.</p>
</sec>
<sec>
<st>
<p>Experiment 3</p>
</st>
<p>The third experiment is taken from the work of Daemen <it>et al</it>. about the kernel-based integration of genome-wide data for clinical decision support in cancer diagnosis <abbrgrp>
<abbr bid="B33">33</abbr>
</abbrgrp>. Thirty-six patients with rectal cancer were treated by combination of cetuximab, capecitabine and external beam radiotherapy and their tissue and plasma samples were gathered at three time points: before treatment (<it>T</it>
<sub>0</sub>); at the early therapy treatment (<it>T</it>
<sub>1</sub>) and at the moment of surgery (<it>T</it>
<sub>2</sub>). The tissue samples were hybridized to gene chip arrays and after processing, the expression was reduced to 6,913 genes. Ninety-six proteins known to be involved in cancer were measured in the plasma samples, and the ones that had absolute values above the detection limit in less than 20% of the samples were excluded for each time point separately. This resulted in the exclusion of six proteins at <it>T</it>
<sub>0 </sub>and four at <it>T</it>
<sub>1</sub>. "Responders" were distinguished from "non-responders" according to the pathologic lymph node stage at surgery (pN-STAGE). The "responder" class contains 22 patients with no lymph node found at surgery whereas the "non-responder" class contains 14 patients with at least 1 regional lymph node. Only the two array-expression data sets (MA) measured at <it>T</it>
<sub>0 </sub>and <it>T</it>
<sub>1 </sub>and the two proteomics data sets (PT) measured at <it>T</it>
<sub>0 </sub>and <it>T</it>
<sub>1 </sub>were used to predict the outcome of cancer at surgery.</p>
<p>Similar to the original method applied on the data <abbrgrp>
<abbr bid="B33">33</abbr>
</abbrgrp>, we used R BioConductor DEDS as feature selection techniques for microarray data and the Wilcoxon rank sum test for proteomics data. The statistical feature selection procedure was independent to the classification procedure, however, the performance varied widely with the number of selected genes and proteins. We considered the relevance of features (genes and proteins) as prior knowledge and systematically evaluated the performance using multiple numbers of genes and proteins. According to the ranking of statistical feature selection, we gradually increased the number of genes and proteins from 11 to 36, and combined the linear kernels constructed by these features. The performance was evaluated by LOO method, where the reason was two folded: firstly, the number of samples was small (36 patients); secondly, the kernels were all constructed with a linear function. Moreover, in LSSVM classification we proposed the strategy to estimate the regularization parameter &#955; in kernel fusion. Therefore, no hyperparameter was needed to be tuned so we reported the LOO validation result as the performance of generalization.</p>
</sec>
<sec>
<st>
<p>Experiment 4</p>
</st>
<p>Our fourth experiment considered three clinical data sets. These three data sets were derived from different clinical studies and were used by Daemen and De Moor <abbrgrp>
<abbr bid="B34">34</abbr>
</abbrgrp> as validation data for clinical kernel function development. Data set I contains clinical information on 402 patients with an endometrial disease who underwent an echographic examination and color Droppler <abbrgrp>
<abbr bid="B35">35</abbr>
</abbrgrp>. The patients are divided into two groups according to their histology: malignant (hyperplasia, polyp, myoma, and carcinoma) versus benign (proliferative endometrium, secretory endometrium, atrophia). After excluding patients with incomplete data, the data contains 339 patients of which 163 malignant and 176 benign. Data set II comes from a prospective observational study of 1828 women undergoing transvaginal sonography before 12 weeks gestation, resulting in data for 2356 pregnancies of which 1458 normal at week 12 and 898 miscarriages during the first trimester <abbrgrp>
<abbr bid="B36">36</abbr>
</abbrgrp>. Data set III contains data on 1003 pregnancies of unknown location (PUL) <abbrgrp>
<abbr bid="B37">37</abbr>
</abbrgrp>. Within the PUL group, there are four clinical outcomes: a failing PUL, an intrauterine pregnancy (IUP), an ectopic pregnancy (EP) or a persisting PUL. Because persisting PULs are rare (18 cases in the data set), they were excluded, as well as pregnancies with missing data. The final data set consists of 856 PULs among which 460 failing PULs, 330 IUPs, and 66 EPs. As the most important diagnostic problem is the correct classification of the EPs versus non-EPs <abbrgrp>
<abbr bid="B38">38</abbr>
</abbrgrp>, the data was divided as 790 non-EPs and 66 EPs. To simulate a problem of combining multiple sources, for each data we created eight kernels and combined them using MKL algorithms for classification. The eight kernels included one linear kernel, three RBF kernels, three polynomial kernels and a clinical kernel. The kernel width of the first RBF kernel is selected by empirical rules as four times the average covariance of all the samples, the second and the third kernel widths were respectively six and eight times the average covariance. The degrees of the three polynomial kernels were set to 2, 3, and 4 respectively. The bias term of polynomial kernels was set to 1. The clinical kernels were constructed as proposed by Daemen and De Moor <abbrgrp>
<abbr bid="B33">33</abbr>
</abbrgrp>. All the kernel functions are explained in section 3 of Additional file <supplr sid="S1">1</supplr>. We noticed that the class labels of the pregnancy data were quite imbalanced (790 non-EPs and 66 EPs). In literature, the class imbalanced problem can be tackled by modifying the cost of different classes in the objective function of SVM. Therefore, we applied weighted SVM MKL and weighted LSSVM MKL on the imbalanced pregnancy data. For the other two data sets, we compared the performance of SVM MKL and LSSVM MKL with different norms.</p>
<p>The performance of classification was benchmarked using 3-fold cross validation. Each data set was randomly and equally divided into 3 parts. As introduced in the Methods section, when combining multiple pre-constructed kernels in LSSVM based algorithms, the regularization parameter &#955; can be jointly estimated as the coefficient of identity matrix. In this case we don't need to optimize any hyper-parameter in the LSSVM. In the estimation approach of LSSVM and all approaches of SVM, we therefore could use both training and validation data to train the classifier, and test data to evaluate the performance. The evaluation was repeated three times, so each part was used once as test data. The average performance was reported as the evaluation of one repetition. In the standard validation approach of LSSVM, each dataset was partitioned randomly into three parts for training, validation and testing. The classifier was trained on the training data and the hyper-parameter &#955; was tuned on the validation data. When tuning the &#955;, its values were sampled uniformly on the log scale from 2<sup>-10 </sup>to 2<sup>10</sup>. Then, at optimal &#955;, the classifier was retrained on the combined training and validation set and the resulting model is tested on the testing set. Obviously, the estimation approach is more efficient than the validation approach because the former approach only requires one training process whereas the latter needs to perform 22 times an additional training (21 &#955; values plus the model retraining). The performance of these two approaches was also investigated in this experiment.</p>
</sec>
<sec>
<st>
<p>Experiment 5</p>
</st>
<p>As introduced in the Methods section, a same MKL problem can be formulated as different optimization problems such as SOCP, QCQP, and SIP. The accuracy of the discretization method for solving SIP is mainly determined by the tolerance value <it>&#949; </it>predefined in the stopping criterion. In our implementation, <it>&#949; </it>was set to 5 &#215; 10<sup>-4</sup>. These different formulations yield the same result but mainly differ on computational efficiency. In the fifth experiment we compared the efficiency of these optimization techniques on two large scale UCI data sets. The two data sets are digit recognition data for pen based handwriting recognition and optical based digit recognition. Both data sets contain more than 6000 data samples thus they were used as real large scale data sets to evaluate the computational efficiency. In our implementation, the optimization problems were solved by Sedumi <abbrgrp>
<abbr bid="B14">14</abbr>
</abbrgrp>, MOSEK <abbrgrp>
<abbr bid="B15">15</abbr>
</abbrgrp> and the Matlab optimization toolbox. All the numerical experiments were carried on a dual Opteron 250 Unix system with 16 G memory and the computational efficiency was evaluated by the CPU time (in seconds).</p>
</sec>
</sec>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<sec>
<st>
<p>Experiment 1: disease relevant gene prioritization by genomic data fusion</p>
</st>
<p>In the first experiment, the <it>L</it>
<sub>2 </sub>1-SVM MKL algorithm performed the best (Error 0.0780). As shown in Table <tblr tid="T5">5</tblr>, the <it>L</it>
<sub>&#8734; </sub>and <it>L</it>
<sub>1 </sub>approaches all performed significantly worse than the <it>L</it>
<sub>2 </sub>approach. For example, in the current experiment, when setting the minimal boundary of the kernel coefficients to 0.5, each data source was ensured to have a minimal contribution in integration, thereby improving the <it>L</it>
<sub>&#8734; </sub>performance from 0.0923 to 0.0806, although still lower than <it>L</it>
<sub>2</sub>. In Figure <figr fid="F1">1</figr> we illustrate the optimal kernel coefficients of different approaches. As shown, the <it>L</it>
<sub>&#8734; </sub>method assigned dominant coefficients to Text mining and Gene Ontology data, whereas other data sources were almost discarded from integration. In contrast, the <it>L</it>
<sub>2 </sub>approach evenly distributed the coefficients over all data sources and thoroughly combined them in integration. When combining multiple kernels, sparse coefficients combine the model only with one or two kernels, making the combined model fragile with respect to the uncertainty and novelty. In real problems, the relevance of a new gene to a certain disease may not have been investigated thus a model solely based on Text and GO annotation is less reliable. <it>L</it>
<sub>2 </sub>based integration evenly combines multiple genomic data sources. In this experiment, the <it>L</it>
<sub>2 </sub>approach showed the same effect as the regularized <it>L</it>
<sub>&#8734; </sub>by setting some minimal boundaries on kernel coefficients. However, in the regularized <it>L</it>
<sub>&#8734;</sub>, the minimal boundary <it>&#952;</it>
<sub>
<it>min </it>
</sub>usually is predefined according to the "rule of thumb". The main advantage of the <it>L</it>
<sub>2 </sub>approach is that the <it>&#952;</it>
<sub>
<it>min </it>
</sub>values are determined automatically for different kernels and the performance is shown to be better with the manually selected values.</p>
<tbl id="T5"><title><p>Table 5</p></title><caption><p>Results of experiment 1: prioritization of 620 disease relevant genes by genomic data fusion</p></caption><tblbdy cols="8">
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>
               <b>Error of AUC (mean)</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Error of AUC (std.)</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>p-value</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>corr</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>corr</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>corr</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>corr</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="8">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>L</it>
               <sub>
                  <it>&#8734;</it>
               </sub>
            </p>
         </c>
         <c ca="left">
            <p>0.0923</p>
         </c>
         <c ca="left">
            <p>0.0035</p>
         </c>
         <c ca="left">
            <p>2.98 &#183; 10<sup>-17</sup></p>
         </c>
         <c ca="left">
            <p>-</p>
         </c>
         <c ca="left">
            <p>0.94</p>
         </c>
         <c ca="left">
            <p>0.66</p>
         </c>
         <c ca="left">
            <p>0.82</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>L</it><sub><it>&#8734;</it></sub>(0.5)</p>
         </c>
         <c ca="left">
            <p>0.0806</p>
         </c>
         <c ca="left">
            <p>0.0033</p>
         </c>
         <c ca="left">
            <p>2.66 &#183; 10<sup>-06</sup></p>
         </c>
         <c ca="left">
            <p>0.94</p>
         </c>
         <c ca="left">
            <p>-</p>
         </c>
         <c ca="left">
            <p>0.82</p>
         </c>
         <c ca="left">
            <p>0.92</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>L</it>
               <sub>1</sub>
            </p>
         </c>
         <c ca="left">
            <p>0.0908</p>
         </c>
         <c ca="left">
            <p>0.0042</p>
         </c>
         <c ca="left">
            <p>1.92 &#183; 10<sup>-16</sup></p>
         </c>
         <c ca="left">
            <p>0.66</p>
         </c>
         <c ca="left">
            <p>0.82</p>
         </c>
         <c ca="left">
            <p>-</p>
         </c>
         <c ca="left">
            <p>0.90</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>L</it>
               <sub>2</sub>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>0.0780</b>
            </p>
         </c>
         <c ca="left">
            <p>0.0034</p>
         </c>
         <c ca="left">
            <p>-</p>
         </c>
         <c ca="left">
            <p>0.82</p>
         </c>
         <c ca="left">
            <p>0.92</p>
         </c>
         <c ca="left">
            <p>0.90</p>
         </c>
         <c ca="left">
            <p>-</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Results of experiment 1: disease relevant gene prioritization by genomic data fusion. The error of AUC values is evaluated by LOO validation in 20 random repetitions. The best performance (<it>L</it><sub>2</sub>) is shown in bold. The p-values are compared with the best performance using a paired t-test. As shown, the <it>L</it><sub>2 </sub>method is significantly better than other methods. The paired Spearman correlation scores compare similarities of rankings obtained by different approaches when compared with the target rankings (denoted as -). Higher Spearman correlation values mean that the two ranking results are much similar.</p>
   </tblfn></tbl>
<fig id="F1"><title><p>Figure 1</p></title><caption><p>Optimal kernel coefficients for disease gene prioritization</p></caption><text>
   <p><b>Optimal kernel coefficients for disease gene prioritization</b>. Optimal kernel coefficients assigned on genomic data sources in disease gene prioritization. For each method, the average coefficients of 20 repetitions are shown. The three most important data sources ranked by <it>L</it><sub>&#8734; </sub>are Text, GO, and Motif. The coefficients on other six sources are almost zero. The <it>L</it><sub>2 </sub>method shows the same ranking on these three best data sources as <it>L</it><sub>&#8734;</sub>, moreover, it also shows ranking for other six sources. Thus, as another advantage of <it>L</it><sub>2 </sub>method, it provides more refined ranking of data sources than <it>L</it><sub>&#8734; </sub>method in data integration.</p>
</text><graphic file="1471-2105-11-309-1"/></fig>
</sec>
<sec>
<st>
<p>Experiment 2: Prioritization of recently discovered prostate cancer genes by genomic data fusion</p>
</st>
<p>In the second experiment, recently discovered prostate cancer genes were prioritized using the same data sources and algorithms as in the first experiment. As shown in Table <tblr tid="T6">6</tblr>, the <it>L</it>
<sub>2 </sub>method significantly outperformed other methods on prioritization of gene CDH23, and JAZF1. For 5 other genes (CPNE, EHBP1, MSMB, KLK3, IL16), the performance of the <it>L</it>
<sub>2 </sub>method was comparable to the best result. In section 4 of Additional file <supplr sid="S1">1</supplr>, we also presented the optimal kernel coefficients and the prioritization results for individual sources. As shown in Additional file <supplr sid="S1">1</supplr>, the <it>L</it>
<sub>&#8734; </sub>algorithm assigned most of the coefficients to Text and Microarray data. Text data performs well in the prioritization of known disease genes, however, does not always work the best for newly discovered genes. This experiment demonstrates that when prioritizing novel prostate cancer relevant genes, the <it>L</it>
<sub>2 </sub>MKL approach evenly optimized the kernel coefficients to combine heterogeneous genomic sources and its performance was significantly better than the <it>L</it>
<sub>&#8734; </sub>method. Moreover, we also compared the kernel based data fusion approach with the Endeavour gene prioritization software: for 6 genes the MKL approach performed significantly better than Endeavour.</p>
<tbl id="T6"><title><p>Table 6</p></title><caption><p>Results of experiment 2: prioritization of prostate cancer genes by genomic data fusion</p></caption><tblbdy cols="8">
      <r>
         <c ca="left">
            <p>
               <b>Name</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Ensemble id</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>References</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>
                  <it>L</it>
               </b>
               <sub>&#8734;</sub>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>
                  <it>L</it>
               </b>
               <sub>&#8734;</sub>
               <b>(0.5)</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>
                  <it>L</it>
               </b>
               <sub>
                  <b>1</b>
               </sub>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>
                  <it>L</it>
               </b>
               <sub>
                  <b>2</b>
               </sub>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Endeavour</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="8">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CPNE</p>
         </c>
         <c ca="left">
            <p>ENSG00000085719</p>
         </c>
         <c ca="left">
            <p>Thomas <it>et al</it>.</p>
         </c>
         <c ca="left">
            <p>0.3030</p>
         </c>
         <c ca="left">
            <p>0.2323</p>
         </c>
         <c ca="left">
            <p>
               <b>0.1010</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <it>0.1212</it>
            </p>
         </c>
         <c ca="left">
            <p>-</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>31/100</p>
         </c>
         <c ca="left">
            <p>24/100</p>
         </c>
         <c ca="left">
            <p>
               <b>11/100</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <it>13/100</it>
            </p>
         </c>
         <c ca="left">
            <p>70/100</p>
         </c>
      </r>
      <r>
         <c cspan="8">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CDH23</p>
         </c>
         <c ca="left">
            <p>ENSG00000107736</p>
         </c>
         <c ca="left">
            <p>Thomas <it>et al</it>.</p>
         </c>
         <c ca="left">
            <p>0.0606</p>
         </c>
         <c ca="left">
            <p>0.0303</p>
         </c>
         <c ca="left">
            <p>
               <it>0.0202</it>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>0.0101</b>
            </p>
         </c>
         <c ca="left">
            <p>-</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>7/100</p>
         </c>
         <c ca="left">
            <p>4/100</p>
         </c>
         <c ca="left">
            <p>
               <it>3/100</it>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>2/100</b>
            </p>
         </c>
         <c ca="left">
            <p>78/100</p>
         </c>
      </r>
      <r>
         <c cspan="8">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>EHBP1</p>
         </c>
         <c ca="left">
            <p>ENSG00000115504</p>
         </c>
         <c ca="left">
            <p>Gudmundsson <it>et al</it>.</p>
         </c>
         <c ca="left">
            <p>0.5354</p>
         </c>
         <c ca="left">
            <p>0.5152</p>
         </c>
         <c ca="left">
            <p>
               <b>0.3434</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <it>0.3939</it>
            </p>
         </c>
         <c ca="left">
            <p>-</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>54/100</p>
         </c>
         <c ca="left">
            <p>52/100</p>
         </c>
         <c ca="left">
            <p>
               <b>35/100</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <it>40/100</it>
            </p>
         </c>
         <c ca="left">
            <p>57/100</p>
         </c>
      </r>
      <r>
         <c cspan="8">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>MSMB</p>
         </c>
         <c ca="left">
            <p>ENSG00000138294</p>
         </c>
         <c ca="left">
            <p>Eeles <it>et al</it>.</p>
         </c>
         <c ca="left">
            <p>
               <b>0.0202</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>0.0202</b>
            </p>
         </c>
         <c ca="left">
            <p>0.0505</p>
         </c>
         <c ca="left">
            <p>
               <it>0.0303</it>
            </p>
         </c>
         <c ca="left">
            <p>-</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Thomas <it>et al</it>.</p>
         </c>
         <c ca="left">
            <p>
               <b>3/100</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>3/100</b>
            </p>
         </c>
         <c ca="left">
            <p>6/100</p>
         </c>
         <c ca="left">
            <p>
               <it>4/100</it>
            </p>
         </c>
         <c ca="left">
            <p>69/100</p>
         </c>
      </r>
      <r>
         <c cspan="8">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>KLK3</p>
         </c>
         <c ca="left">
            <p>ENSG00000142515</p>
         </c>
         <c ca="left">
            <p>Eeles <it>et al</it>.</p>
         </c>
         <c ca="left">
            <p>0.3434</p>
         </c>
         <c ca="left">
            <p>0.3535</p>
         </c>
         <c ca="left">
            <p>
               <it>0.2929</it>
            </p>
         </c>
         <c ca="left">
            <p>
               <it>0.2929</it>
            </p>
         </c>
         <c ca="left">
            <p>-</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>35/100</p>
         </c>
         <c ca="left">
            <p>36/100</p>
         </c>
         <c ca="left">
            <p>
               <it>30/100</it>
            </p>
         </c>
         <c ca="left">
            <p>
               <it>30/100</it>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>28/100</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="8">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>JAZF1</p>
         </c>
         <c ca="left">
            <p>ENSG00000153814</p>
         </c>
         <c ca="left">
            <p>Thomas <it>et al</it>.</p>
         </c>
         <c ca="left">
            <p>
               <it>0.0505</it>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>0.0202</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>0.0202</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>0.0202</b>
            </p>
         </c>
         <c ca="left">
            <p>-</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>
               <it>6/100</it>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>3/100</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>3/100</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>3/100</b>
            </p>
         </c>
         <c ca="left">
            <p>7/100</p>
         </c>
      </r>
      <r>
         <c cspan="8">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>LMTK2</p>
         </c>
         <c ca="left">
            <p>ENSG00000164715</p>
         </c>
         <c ca="left">
            <p>Eeles <it>et al</it>.</p>
         </c>
         <c ca="left">
            <p>
               <it>0.3131</it>
            </p>
         </c>
         <c ca="left">
            <p>0.4646</p>
         </c>
         <c ca="left">
            <p>0.8081</p>
         </c>
         <c ca="left">
            <p>0.7677</p>
         </c>
         <c ca="left">
            <p>-</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>
               <it>32/100</it>
            </p>
         </c>
         <c ca="left">
            <p>47/100</p>
         </c>
         <c ca="left">
            <p>81/100</p>
         </c>
         <c ca="left">
            <p>77/100</p>
         </c>
         <c ca="left">
            <p>
               <b>31/100</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="8">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>IL16</p>
         </c>
         <c ca="left">
            <p>ENSG00000172349</p>
         </c>
         <c ca="left">
            <p>Thomas <it>et al</it>.</p>
         </c>
         <c ca="left">
            <p>
               <b>0</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <it>0.0101</it>
            </p>
         </c>
         <c ca="left">
            <p>0.0303</p>
         </c>
         <c ca="left">
            <p>
               <it>0.0101</it>
            </p>
         </c>
         <c ca="left">
            <p>-</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>
               <b>1/100</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <it>2/100</it>
            </p>
         </c>
         <c ca="left">
            <p>4/100</p>
         </c>
         <c ca="left">
            <p>
               <it>2/100</it>
            </p>
         </c>
         <c ca="left">
            <p>72/100</p>
         </c>
      </r>
      <r>
         <c cspan="8">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CTBP2</p>
         </c>
         <c ca="left">
            <p>ENSG00000175029</p>
         </c>
         <c ca="left">
            <p>Thomas <it>et al</it>.</p>
         </c>
         <c ca="left">
            <p>
               <it>0.8283</it>
            </p>
         </c>
         <c ca="left">
            <p>0.5758</p>
         </c>
         <c ca="left">
            <p>
               <it>0.6364</it>
            </p>
         </c>
         <c ca="left">
            <p>0.6869</p>
         </c>
         <c ca="left">
            <p>-</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>
               <it>83/100</it>
            </p>
         </c>
         <c ca="left">
            <p>58/100</p>
         </c>
         <c ca="left">
            <p>
               <it>64/100</it>
            </p>
         </c>
         <c ca="left">
            <p>69/100</p>
         </c>
         <c ca="left">
            <p>
               <b>38/100</b>
            </p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Results of experiment 2: prioritization of prostate cancer genes by genomic data fusion. For each novel prostate cancer gene, the first row shows the error of AUC values and the second row lists the ranking position of the prostate cancer gene among its 99 closet neighboring genes.</p>
   </tblfn></tbl>
</sec>
<sec>
<st>
<p>Experiment 3: Clinical decision support by integrating microarray and proteomics data</p>
</st>
<p>One of the main contributions of this paper is that the <it>L</it>
<sub>2 </sub>MKL notion can be applied on various machine learning problems. The first two experiments demonstrated a ranking problem using 1-SVM MKL to prioritize disease relevant genes. In the third experiment we optimized the <it>L</it>
<sub>&#8734;</sub>, <it>L</it>
<sub>1</sub>, and <it>L</it>
<sub>2 </sub>-norm in SVM MKL and LSSVM MKL classifiers to support the diagnosis of patients according to their lymph node stage in rectal cancer development. The performance of the classifiers greatly depended on the selected features, therefore, for each classifier we compared 25 feature selection results (as a grid of 5 numbers of genes multiplied by 5 numbers of proteins). As shown in Table <tblr tid="T7">7</tblr>, the best performance was obtained with LSSVM <it>L</it>
<sub>1 </sub>(error of AUC = 0.0325) using 25 genes and 15 proteins. The <it>L</it>
<sub>2 </sub>LSSVM MKL classifier was also promising because its performance was comparable to the best result. In particular, for the two compared classifiers (LSSVM and SVM), the <it>L</it>
<sub>1 </sub>and <it>L</it>
<sub>2 </sub>approaches significantly outperformed the <it>L</it>
<sub>&#8734; </sub>approach. We also tried to regularize the kernel coefficients in <it>L</it>
<sub>&#8734; </sub>MKL using different <it>&#952;</it>
<sub>
<it>min </it>
</sub>values. Nine different <it>&#952;</it>
<sub>
<it>min </it>
</sub>were tried uniformly from 0.1 to 0.9 and the changes in performance is shown in Figure <figr fid="F2">2</figr>. As shown, increasing the <it>&#952;</it>
<sub>
<it>min </it>
</sub>value steadily improves the performance of LSSVM MKL and SVM MKL on the rectal cancer data sets. However, determining the optimal <it>&#952;</it>
<sub>
<it>min </it>
</sub>was a non-trivial issue. When <it>&#952;</it>
<sub>
<it>min </it>
</sub>was smaller than 0.6, the performance of LSSVM MKL <it>L</it>
<sub>&#8734; </sub>remained unchanged, meaning that the "rule of thumb" value 0.5 used in experiment 1 is not valid here. In comparison, when using the <it>L</it>
<sub>2 </sub>based MKL classifiers, there is no need to specify <it>&#952;</it>
<sub>
<it>min </it>
</sub>and the performance is still comparable to the best performance obtained with regularized <it>L</it>
<sub>&#8734; </sub>MKL.</p>
<tbl id="T7"><title><p>Table 7</p></title><caption><p>Results of experiment 3: classification of patients in rectal cancer clinical decision using microarray and proteomics data sets</p></caption><tblbdy cols="11">
      <r>
         <c>
            <p/>
         </c>
         <c cspan="5" ca="center">
            <p>
               <b>LSSVM <it>L</it><sub>&#8734;</sub></b>
            </p>
         </c>
         <c cspan="5" ca="center">
            <p>
               <b>SVM <it>L</it><sub>&#8734;</sub></b>
            </p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="5">
            <hr/>
         </c>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>
               <b>14 p</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>15 p</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>16 p</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>17 p</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>18 p</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>14 p</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>15 p</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>16 p</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>17 p</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>18 p</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="6">
            <hr/>
         </c>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>24 g</p>
         </c>
         <c ca="center">
            <p>0.0584</p>
         </c>
         <c ca="center">
            <p>0.0519</p>
         </c>
         <c ca="center">
            <p>
               <it>0.0747 </it>
            </p>
         </c>
         <c ca="center">
            <p>0.0812</p>
         </c>
         <c ca="center">
            <p>0.0812</p>
         </c>
         <c ca="center">
            <p>0.1331</p>
         </c>
         <c ca="center">
            <p>0.1331</p>
         </c>
         <c ca="center">
            <p>0.1331</p>
         </c>
         <c ca="center">
            <p>0.1331</p>
         </c>
         <c ca="center">
            <p>0.1364</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>25 g</p>
         </c>
         <c ca="center">
            <p>
               <it>0.0390</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>0.0390</it>
            </p>
         </c>
         <c ca="center">
            <p>0.0519</p>
         </c>
         <c ca="center">
            <p>0.0617</p>
         </c>
         <c ca="center">
            <p>0.0649</p>
         </c>
         <c ca="center">
            <p>0.1136</p>
         </c>
         <c ca="center">
            <p>0.1104</p>
         </c>
         <c ca="center">
            <p>0.1234</p>
         </c>
         <c ca="center">
            <p>0.1201</p>
         </c>
         <c ca="center">
            <p>0.1234</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>26 g</p>
         </c>
         <c ca="center">
            <p>0.0487</p>
         </c>
         <c ca="center">
            <p>0.0487</p>
         </c>
         <c ca="center">
            <p>0.0812</p>
         </c>
         <c ca="center">
            <p>0.0844</p>
         </c>
         <c ca="center">
            <p>0.0877</p>
         </c>
         <c ca="center">
            <p>0.1266</p>
         </c>
         <c ca="center">
            <p>0.1136</p>
         </c>
         <c ca="center">
            <p>0.1234</p>
         </c>
         <c ca="center">
            <p>0.1299</p>
         </c>
         <c ca="center">
            <p>0.1364</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>27 g</p>
         </c>
         <c ca="center">
            <p>0.0617</p>
         </c>
         <c ca="center">
            <p>0.0649</p>
         </c>
         <c ca="center">
            <p>0.0812</p>
         </c>
         <c ca="center">
            <p>0.0877</p>
         </c>
         <c ca="center">
            <p>0.0942</p>
         </c>
         <c ca="center">
            <p>0.1429</p>
         </c>
         <c ca="center">
            <p>0.1364</p>
         </c>
         <c ca="center">
            <p>0.1364</p>
         </c>
         <c ca="center">
            <p>0.1331</p>
         </c>
         <c ca="center">
            <p>0.1461</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>28 g</p>
         </c>
         <c ca="center">
            <p>0.0552</p>
         </c>
         <c ca="center">
            <p>0.0487</p>
         </c>
         <c ca="center">
            <p>0.0617</p>
         </c>
         <c ca="center">
            <p>0.0747</p>
         </c>
         <c ca="center">
            <p>0.0714</p>
         </c>
         <c ca="center">
            <p>0.1429</p>
         </c>
         <c ca="center">
            <p>0.1331</p>
         </c>
         <c ca="center">
            <p>0.1331</p>
         </c>
         <c ca="center">
            <p>0.1364</p>
         </c>
         <c ca="center">
            <p>0.1396</p>
         </c>
      </r>
      <r>
         <c cspan="6">
            <hr/>
         </c>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="5" ca="center">
            <p>LSSVM <it>L</it><sub>&#8734; </sub>(0.5)</p>
         </c>
         <c cspan="5" ca="center">
            <p>SVM <it>L</it><sub>&#8734; </sub>(0.5)</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="5">
            <hr/>
         </c>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>14 p</p>
         </c>
         <c ca="center">
            <p>15 p</p>
         </c>
         <c ca="center">
            <p>16 p</p>
         </c>
         <c ca="center">
            <p>17 p</p>
         </c>
         <c ca="center">
            <p>18 p</p>
         </c>
         <c ca="center">
            <p>14 p</p>
         </c>
         <c ca="center">
            <p>15 p</p>
         </c>
         <c ca="center">
            <p>16 p</p>
         </c>
         <c ca="center">
            <p>17 p</p>
         </c>
         <c ca="center">
            <p>18 p</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>24 g</p>
         </c>
         <c ca="center">
            <p>0.0584</p>
         </c>
         <c ca="center">
            <p>0.0519</p>
         </c>
         <c ca="center">
            <p>
               <it>0.0747</it>
            </p>
         </c>
         <c ca="center">
            <p>0.0812</p>
         </c>
         <c ca="center">
            <p>0.0812</p>
         </c>
         <c ca="center">
            <p>0.1266</p>
         </c>
         <c ca="center">
            <p>0.1006</p>
         </c>
         <c ca="center">
            <p>0.1266</p>
         </c>
         <c ca="center">
            <p>0.1299</p>
         </c>
         <c ca="center">
            <p>0.1331</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>25 g</p>
         </c>
         <c ca="center">
            <p>
               <it>0.0390</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>0.0390</it>
            </p>
         </c>
         <c ca="center">
            <p>0.0519</p>
         </c>
         <c ca="center">
            <p>0.0617</p>
         </c>
         <c ca="center">
            <p>0.0649</p>
         </c>
         <c ca="center">
            <p>0.1136</p>
         </c>
         <c ca="center">
            <p>0.1071</p>
         </c>
         <c ca="center">
            <p>0.1234</p>
         </c>
         <c ca="center">
            <p>0.1201</p>
         </c>
         <c ca="center">
            <p>0.1234</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>26 g</p>
         </c>
         <c ca="center">
            <p>0.0487</p>
         </c>
         <c ca="center">
            <p>0.0487</p>
         </c>
         <c ca="center">
            <p>0.0812</p>
         </c>
         <c ca="center">
            <p>0.0844</p>
         </c>
         <c ca="center">
            <p>0.0877</p>
         </c>
         <c ca="center">
            <p>0.1136</p>
         </c>
         <c ca="center">
            <p>0.1136</p>
         </c>
         <c ca="center">
            <p>0.1201</p>
         </c>
         <c ca="center">
            <p>0.1266</p>
         </c>
         <c ca="center">
            <p>0.1331</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>27 g</p>
         </c>
         <c ca="center">
            <p>0.0617</p>
         </c>
         <c ca="center">
            <p>0.0649</p>
         </c>
         <c ca="center">
            <p>0.0812</p>
         </c>
         <c ca="center">
            <p>0.0877</p>
         </c>
         <c ca="center">
            <p>0.0942</p>
         </c>
         <c ca="center">
            <p>0.1364</p>
         </c>
         <c ca="center">
            <p>0.1364</p>
         </c>
         <c ca="center">
            <p>0.1364</p>
         </c>
         <c ca="center">
            <p>0.1331</p>
         </c>
         <c ca="center">
            <p>0.1461</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>28 g</p>
         </c>
         <c ca="center">
            <p>0.0552</p>
         </c>
         <c ca="center">
            <p>0.0487</p>
         </c>
         <c ca="center">
            <p>0.0617</p>
         </c>
         <c ca="center">
            <p>0.0747</p>
         </c>
         <c ca="center">
            <p>0.0714</p>
         </c>
         <c ca="center">
            <p>0.1299</p>
         </c>
         <c ca="center">
            <p>0.1299</p>
         </c>
         <c ca="center">
            <p>0.1299</p>
         </c>
         <c ca="center">
            <p>0.1331</p>
         </c>
         <c ca="center">
            <p>0.1364</p>
         </c>
      </r>
      <r>
         <c cspan="6">
            <hr/>
         </c>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="5" ca="center">
            <p>LSSVM <it>L</it><sub>1</sub></p>
         </c>
         <c cspan="5" ca="center">
            <p>SVM <it>L</it><sub>1</sub></p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="5">
            <hr/>
         </c>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>14 p</p>
         </c>
         <c ca="center">
            <p>15 p</p>
         </c>
         <c ca="center">
            <p>16 p</p>
         </c>
         <c ca="center">
            <p>17 p</p>
         </c>
         <c ca="center">
            <p>18 p</p>
         </c>
         <c ca="center">
            <p>14 p</p>
         </c>
         <c ca="center">
            <p>15 p</p>
         </c>
         <c ca="center">
            <p>16 p</p>
         </c>
         <c ca="center">
            <p>17 p</p>
         </c>
         <c ca="center">
            <p>18 p</p>
         </c>
      </r>
      <r>
         <c cspan="6">
            <hr/>
         </c>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>24 g</p>
         </c>
         <c ca="center">
            <p>
               <b>0.0487</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>0.0487</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>0.0682</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>0.0682</b>
            </p>
         </c>
         <c ca="center">
            <p>0.0747</p>
         </c>
         <c ca="center">
            <p>0.0747</p>
         </c>
         <c ca="center">
            <p>0.0584</p>
         </c>
         <c ca="center">
            <p>0.0714</p>
         </c>
         <c ca="center">
            <p>
               <b>0.0682</b>
            </p>
         </c>
         <c ca="center">
            <p>0.0747</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>25 g</p>
         </c>
         <c ca="center">
            <p>
               <b>0.0357</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>
                  <ul>0.0325</ul>
               </b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>0.0422</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>0.0455</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>0.0455</b>
            </p>
         </c>
         <c ca="center">
            <p>0.0584</p>
         </c>
         <c ca="center">
            <p>0.0519</p>
         </c>
         <c ca="center">
            <p>0.0649</p>
         </c>
         <c ca="center">
            <p>0.0714</p>
         </c>
         <c ca="center">
            <p>0.0714</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>26 g</p>
         </c>
         <c ca="center">
            <p>
               <b>0.0357</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>0.0357</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>0.0455</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>0.0455</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>0.0455</b>
            </p>
         </c>
         <c ca="center">
            <p>0.0584</p>
         </c>
         <c ca="center">
            <p>0.0519</p>
         </c>
         <c ca="center">
            <p>0.0682</p>
         </c>
         <c ca="center">
            <p>0.0682</p>
         </c>
         <c ca="center">
            <p>0.0682</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>27 g</p>
         </c>
         <c ca="center">
            <p>
               <b>0.0357</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>0.0357</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>0.0455</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>0.0487</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>0.0519</b>
            </p>
         </c>
         <c ca="center">
            <p>0.0617</p>
         </c>
         <c ca="center">
            <p>0.0584</p>
         </c>
         <c ca="center">
            <p>0.0714</p>
         </c>
         <c ca="center">
            <p>0.0682</p>
         </c>
         <c ca="center">
            <p>0.0682</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>28 g</p>
         </c>
         <c ca="center">
            <p>
               <b>0.0422</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>
                  <ul>0.0325</ul>
               </b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>0.0487</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>0.0487</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>0.0519</b>
            </p>
         </c>
         <c ca="center">
            <p>0.0584</p>
         </c>
         <c ca="center">
            <p>0.0584</p>
         </c>
         <c ca="center">
            <p>0.0649</p>
         </c>
         <c ca="center">
            <p>0.0649</p>
         </c>
         <c ca="center">
            <p>0.0682</p>
         </c>
      </r>
      <r>
         <c cspan="6">
            <hr/>
         </c>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="5" ca="center">
            <p>LSSVM <it>L</it><sub>2</sub></p>
         </c>
         <c cspan="5" ca="center">
            <p>SVM <it>L</it><sub>2</sub></p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="5">
            <hr/>
         </c>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>14 p</p>
         </c>
         <c ca="center">
            <p>15 p</p>
         </c>
         <c ca="center">
            <p>16 p</p>
         </c>
         <c ca="center">
            <p>17 p</p>
         </c>
         <c ca="center">
            <p>18 p</p>
         </c>
         <c ca="center">
            <p>14 p</p>
         </c>
         <c ca="center">
            <p>15 p</p>
         </c>
         <c ca="center">
            <p>16 p</p>
         </c>
         <c ca="center">
            <p>17 p</p>
         </c>
         <c ca="center">
            <p>18 p</p>
         </c>
      </r>
      <r>
         <c cspan="6">
            <hr/>
         </c>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>24 g</p>
         </c>
         <c ca="center">
            <p>
               <it>0.0552</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>0.0487</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>0.0747</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>0.0779</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>0.0714</b>
            </p>
         </c>
         <c ca="center">
            <p>0.0909</p>
         </c>
         <c ca="center">
            <p>0.0877</p>
         </c>
         <c ca="center">
            <p>0.0974</p>
         </c>
         <c ca="center">
            <p>0.0942</p>
         </c>
         <c ca="center">
            <p>0.1006</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>25 g</p>
         </c>
         <c ca="center">
            <p>
               <it>0.0390</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>0.0390</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>0.0487</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>0.0552</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>0.0552</it>
            </p>
         </c>
         <c ca="center">
            <p>0.0747</p>
         </c>
         <c ca="center">
            <p>0.0649</p>
         </c>
         <c ca="center">
            <p>0.0812</p>
         </c>
         <c ca="center">
            <p>0.0844</p>
         </c>
         <c ca="center">
            <p>0.0844</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>26 g</p>
         </c>
         <c ca="center">
            <p>
               <it>0.0390</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>0.0455</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>0.0552</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>0.0649</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>0.0649</it>
            </p>
         </c>
         <c ca="center">
            <p>0.0747</p>
         </c>
         <c ca="center">
            <p>0.0584</p>
         </c>
         <c ca="center">
            <p>0.0812</p>
         </c>
         <c ca="center">
            <p>0.0779</p>
         </c>
         <c ca="center">
            <p>0.0779</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>27g</p>
         </c>
         <c ca="center">
            <p>
               <it>0.0422</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>0.0487</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>0.0552</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>0.0584</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>0.0649</it>
            </p>
         </c>
         <c ca="center">
            <p>0.0779</p>
         </c>
         <c ca="center">
            <p>0.0812</p>
         </c>
         <c ca="center">
            <p>0.0844</p>
         </c>
         <c ca="center">
            <p>0.0812</p>
         </c>
         <c ca="center">
            <p>0.0812</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>28 g</p>
         </c>
         <c ca="center">
            <p>
               <it>0.0455</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>
                  <ul>0.0325</ul>
               </b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>0.0487</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>0.0584</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>0.0552</it>
            </p>
         </c>
         <c ca="center">
            <p>0.0812</p>
         </c>
         <c ca="center">
            <p>0.0714</p>
         </c>
         <c ca="center">
            <p>0.0812</p>
         </c>
         <c ca="center">
            <p>0.0779</p>
         </c>
         <c ca="center">
            <p>0.0812</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>The table shows the error of AUC in patient classification using microarray and proteomics data. In LSSVM <it>L</it><sub>&#8734;</sub>, <it>L</it><sub>&#8734; </sub>(0.5), and <it>L</it><sub>2</sub>, the regularization parameter &#955; was estimated jointly as the kernel coefficient of an identity matrix. In LSSVM <it>L</it><sub>1</sub>, &#955; was set to 1. In all SVM approaches, the <it>C </it>parameter of the box constraint was set to 1. In the table, the row and column labels represent the numbers of genes (g) and proteins (p) used to construct the kernels. The genes and proteins were ranked by feature selection techniques (see text). The AUC of LOO validation was evaluated without the bias term <it>b </it>(as the implicit bias approach) because its value varied by each left out sample. In this problem, considering the bias term decreased the AUC performance. The performance was compared among eight algorithms for the same number of genes and proteins, where the best values (the smallest Error of AUC) are represented in bold, the second best ones in italic. The best performance of all the feature selection results is underlined. The table presents the 25 best feature selection results of each method. The complete experimental results containing 26 different numbers of genes and 26 numbers of proteins is available at <url>http://homes.esat.kuleuven.be/~sistawww/bioi/syu/l2lssvm.html</url>.</p>
   </tblfn></tbl>
<fig id="F2"><title><p>Figure 2</p></title><caption><p>The effect of <it>&#952;</it><sub><it>min </it></sub>on LSSVM MKL and SVM MKL classifier in rectal cancer diagnosis</p></caption><text>
   <p><b>The effect of <it>&#952;</it><sub><it>min </it></sub>on LSSVM MKL and SVM MKL classifier in rectal cancer diagnosis</b>. The effect of <it>&#952;</it><sub><it>min </it></sub>in LSSVM MKL and SVM MKL classifiers for rectal cancer diagnosis. Figure on the top: the performance of LSSVM MKL. Figure on the bottom: the performance of SVM MKL. In each figure we compare three feature selection results. The performance of <it>L</it><sub>2 </sub>MKL is shown as dashed lines.</p>
</text><graphic file="1471-2105-11-309-2"/></fig>
<p>In LSSVM kernel fusion, we estimated the &#955; jointly as a coefficient assigned to an identity matrix. Since the number of samples is small in this experiment, the standard cross-validation approach to select the optimal &#955; on validation data was not tried. To investigate whether the estimated &#955; value is optimal, we set &#955; to 51 different values uniformly sampled on the <it>log</it>
<sub>2 </sub>scale from -10 to 40. We compared the joint estimation result with the optimal classification performance among the sampled &#955; values. The joint estimation results were found as optimal for most of the results. An example is illustrated in Figure <figr fid="F3">3</figr> for the integration of four kernels constructed by 27 gene features and 17 protein features. The coefficients estimated by the <it>L</it>
<sub>&#8734;</sub>-norm were almost 0 thus the &#955; values were very big. In contrast, the &#955; values estimated by the non-sparse <it>L</it>
<sub>2 </sub>method were at reasonable scales.</p>
<fig id="F3"><title><p>Figure 3</p></title><caption><p>Benchmark of various &#955; values in LSSVM MKL classifiers in rectal cancer diagnosis</p></caption><text>
   <p><b>Benchmark of various &#955; values in LSSVM MKL classifiers in rectal cancer diagnosis</b>. Benchmark of various &#955; values in LSSVM MKL classifiers for rectal cancer diagnosis. The four kernels were constructed using 27 gene features and 17 protein features (see text). For each fixed &#955; value, the error of AUC was evaluated by LOO validation. The maximal and minimal estimated &#955; in <it>L</it><sub>&#8734; </sub>and <it>L</it><sub>2 </sub>MKL are shown.</p>
</text><graphic file="1471-2105-11-309-3"/></fig>
</sec>
<sec>
<st>
<p>Experiment 4: Clinical decision support by integrating multiple kernels</p>
</st>
<p>In the fourth experiment we validated the proposed approach on three clinical data sets containing more samples. On the endometrial and miscarriage data sets, we compared eight MKL algorithms with various norms. For the imbalanced pregnancy data set, we applied eight weighted MKL algorithms. The results are shown in Table <tblr tid="T8">8</tblr>, <tblr tid="T9">9</tblr>, and <tblr tid="T10">10</tblr>. On endometrial data, the difference of performance was rather small. Though the two <it>L</it>
<sub>2 </sub>methods were not optimal, they were comparable to the best result. On miscarriage data, the <it>L</it>
<sub>2 </sub>methods performed significantly better than comparing algorithms. On pregnancy data, the weighted <it>L</it>
<sub>2 </sub>LSSVM MKL and weighted <it>L</it>
<sub>1 </sub>LSSVM MKL performed significantly better than others. We also regularized the kernel coefficients using different <it>&#952;</it>
<sub>
<it>min </it>
</sub>values on LSSVM <it>L</it>
<sub>&#8734; </sub>and SVM <it>L</it>
<sub>&#8734; </sub>MKL classifiers. The results are presented in Figure <figr fid="F4">4</figr>, Figure <figr fid="F5">5</figr> and Figure <figr fid="F6">6</figr>. As shown, the optimal <it>&#952;</it>
<sub>
<it>min </it>
</sub>value differs across data sets thus the "rule of thumb" value of 0.5 may not work for all the problems. For the endometrial and miscarriage data sets, the optimal <it>&#952;</it>
<sub>
<it>min </it>
</sub>for both MKL classifiers is 0.2. For pregnancy data set, the optimal <it>&#952;</it>
<sub>
<it>min </it>
</sub>value for LSSVM is 1 and for SVM 0.9. In comparison, on the miscarriage and pregnancy data set, the performance of the <it>L</it>
<sub>2 </sub>algorithm is comparable or even much better than the best regularized <it>L</it>
<sub>&#8734; </sub>algorithm. For the endometrial data set, though the optimal regularized <it>L</it>
<sub>
<it>&#8734; </it>
</sub>LSSVM and SVM MKL classifiers outperform <it>L</it>
<sub>2 </sub>classifiers, <it>L</it>
<sub>2 </sub>methods still perform better than or as equal as the unregularized <it>L</it>
<sub>&#8734; </sub>method.</p>
<tbl id="T8"><title><p>Table 8</p></title><caption><p>Results of experiment 4 data set I: classification of endometrial disease patients using multiple kernels derived from clinical data</p></caption><tblbdy cols="4">
      <r>
         <c ca="left">
            <p>
               <b>Classifier</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Mean - error of AUC</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Std. - error of AUC</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>pvalue</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>LSSVM </b>
               <it>L</it>
               <sub>&#8734;</sub>
               <b> (0.5) MKL</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>0.2353</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>0.0133</b>
            </p>
         </c>
         <c ca="left">
            <p>-</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>SVM </b>
               <it>L</it>
               <sub>&#8734;</sub>
               <b> (0.5) MKL</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>0.2388</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>0.0178</b>
            </p>
         </c>
         <c ca="left">
            <p>0.4369</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>SVM </b>
               <it>L</it>
               <sub>&#8734;</sub>
               <b> MKL</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>0.2417</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>0.0165</b>
            </p>
         </c>
         <c ca="left">
            <p>0.2483</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>LSSVM <it>L</it><sub>2 </sub>MKL</p>
         </c>
         <c ca="left">
            <p>0.2456</p>
         </c>
         <c ca="left">
            <p>0.0124</p>
         </c>
         <c ca="left">
            <p>0.0363</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>SVM <it>L</it><sub>2 </sub>MKL</p>
         </c>
         <c ca="left">
            <p>0.2489</p>
         </c>
         <c ca="left">
            <p>0.0178</p>
         </c>
         <c ca="left">
            <p>0.0130</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>SVM <it>L</it><sub>1 </sub>MKL</p>
         </c>
         <c ca="left">
            <p>0.2513</p>
         </c>
         <c ca="left">
            <p>0.0144</p>
         </c>
         <c ca="left">
            <p>0.0057</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>LSSVM <it>L</it><sub>1 </sub>MKL</p>
         </c>
         <c ca="left">
            <p>0.2574</p>
         </c>
         <c ca="left">
            <p>0.0189</p>
         </c>
         <c ca="left">
            <p>9.98 &#183; 10<sup>-5</sup></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>LSSVM <it>L</it><sub>&#8734; </sub>MKL</p>
         </c>
         <c ca="left">
            <p>0.2678</p>
         </c>
         <c ca="left">
            <p>0.0130</p>
         </c>
         <c ca="left">
            <p>1.53 &#183; 10<sup>-6</sup></p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Results of experiment 4 data set I: classification of endometrial disease patients using multiple kernels derived from clinical data. The classifier with the best performance is shown in bold. The p-values are compared with the best performance using a paired t-test. The performance of classifiers is sorted from high to low according to the p-values.</p>
   </tblfn></tbl>
<tbl id="T9"><title><p>Table 9</p></title><caption><p>Results of experiment 4 data set II: classification of miscarriage patients using multiple kernels derived from clinical data</p></caption><tblbdy cols="4">
      <r>
         <c ca="left">
            <p>
               <b>Classifier</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Mean - error of AUC</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Std. - error of AUC</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>pvalue</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>SVM </b>
               <it>L</it>
               <sub>2 </sub>
               <b>MKL</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>0.1975</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>0.0037</b>
            </p>
         </c>
         <c ca="left">
            <p>-</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>LSSVM </b>
               <it>L</it>
               <sub>2 </sub>
               <b>MKL</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>0.2002</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>0.0049</b>
            </p>
         </c>
         <c ca="left">
            <p>0.0712</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>LSSVM <it>L</it><sub>&#8734; </sub>(0.5) MKL</p>
         </c>
         <c ca="left">
            <p>0.2027</p>
         </c>
         <c ca="left">
            <p>0.0045</p>
         </c>
         <c ca="left">
            <p>9.77 &#183; 10<sup>-4</sup></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>SVM <it>L</it><sub>&#8734; </sub>MKL</p>
         </c>
         <c ca="left">
            <p>0.2109</p>
         </c>
         <c ca="left">
            <p>0.0040</p>
         </c>
         <c ca="left">
            <p>9.55 &#183; 10<sup>-12</sup></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>SVM <it>L</it><sub>&#8734; </sub>(0.5) MKL</p>
         </c>
         <c ca="left">
            <p>0.2168</p>
         </c>
         <c ca="left">
            <p>0.0040</p>
         </c>
         <c ca="left">
            <p>1.79 &#183; 10<sup>-12</sup></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>LSSVM <it>L</it><sub>1 </sub>MKL</p>
         </c>
         <c ca="left">
            <p>0.2132</p>
         </c>
         <c ca="left">
            <p>0.0029</p>
         </c>
         <c ca="left">
            <p>1.11 &#183; 10<sup>-13</sup></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>SVM <it>L</it><sub>1 </sub>MKL</p>
         </c>
         <c ca="left">
            <p>0.2297</p>
         </c>
         <c ca="left">
            <p>0.0038</p>
         </c>
         <c ca="left">
            <p>1.10 &#183; 10<sup>-15</sup></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>LSSVM <it>L</it><sub>&#8734; </sub>MKL</p>
         </c>
         <c ca="left">
            <p>0.2319</p>
         </c>
         <c ca="left">
            <p>0.0015</p>
         </c>
         <c ca="left">
            <p>3.42 &#183; 10<sup>-21</sup></p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Results of experiment 4 data set II: classification of miscarriage patients using multiple kernels derived from clinical data. The classifier with the best performance is shown in bold. The p-values are compared with the best performance using a paired t-test. The performance of classifiers is sorted from high to low according to the p-values.</p>
   </tblfn></tbl>
<tbl id="T10"><title><p>Table 10</p></title><caption><p>Results of experiment 4 data set III: classification of PUL patients using multiple kernels derived from clinical data</p></caption><tblbdy cols="4">
      <r>
         <c ca="left">
            <p>
               <b>Classifier</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Mean - error of AUC</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Std. - error of AUC</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>pvalue</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>Weighted LSSVM </b>
               <it>L</it>
               <sub>2 </sub>
               <b>MKL</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>0.1165</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>0.0100</b>
            </p>
         </c>
         <c ca="left">
            <p>-</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>Weighted LSSVM </b>
               <it>L</it>
               <sub>1 </sub>
               <b>MKL</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>0.1243</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>0.0171</b>
            </p>
         </c>
         <c ca="left">
            <p>0.0519</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Weighted LSSVM <it>L</it><sub>&#8734; </sub>(0.5) MKL</p>
         </c>
         <c ca="left">
            <p>0.1290</p>
         </c>
         <c ca="left">
            <p>0.0206</p>
         </c>
         <c ca="left">
            <p>0.0169</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Weighted SVM <it>L</it><sub>2 </sub>MKL</p>
         </c>
         <c ca="left">
            <p>0.1499</p>
         </c>
         <c ca="left">
            <p>0.0248</p>
         </c>
         <c ca="left">
            <p>4.79 &#183; 10<sup>-5</sup></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Weighted SVM <it>L</it><sub>&#8734; </sub>MKL</p>
         </c>
         <c ca="left">
            <p>0.1552</p>
         </c>
         <c ca="left">
            <p>0.0210</p>
         </c>
         <c ca="left">
            <p>1.02 &#183; 10-6</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Weighted SVM <it>L</it><sub>&#8734; </sub>(0.5)</p>
         </c>
         <c ca="left">
            <p>0.1551</p>
         </c>
         <c ca="left">
            <p>0.0153</p>
         </c>
         <c ca="left">
            <p>3.87 &#183; 10<sup>-6</sup></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Weighted SVM <it>L</it><sub>1 </sub>MKL</p>
         </c>
         <c ca="left">
            <p>0.1594</p>
         </c>
         <c ca="left">
            <p>0.0162</p>
         </c>
         <c ca="left">
            <p>2.29 &#183; 10<sup>-9</sup></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Weighted LSSVM <it>L</it><sub>&#8734; </sub>MKL</p>
         </c>
         <c ca="left">
            <p>0.1651</p>
         </c>
         <c ca="left">
            <p>0.0174</p>
         </c>
         <c ca="left">
            <p>4.41 &#183; 10<sup>-10</sup></p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Results of experiment 4 data set II: classification of PUL patients using multiple kernels derived from clinical data. The classifier with the best performance is shown in bold. The p-values are compared with the best performance using a paired t-test. The performance of classifiers is sorted from high to low according to the p-values.</p>
   </tblfn></tbl>
<fig id="F4"><title><p>Figure 4</p></title><caption><p>The effect of <it>&#952;</it><sub><it>min </it></sub>in LSSVM MKL and SVM MKL classifier on endometrial disease data set</p></caption><text>
   <p><b>The effect of <it>&#952;</it><sub><it>min </it></sub>in LSSVM MKL and SVM MKL classifier on endometrial disease data set</b>. The effect of <it>&#952;</it><sub><it>min </it></sub>in LSSVM MKL and SVM MKL classifiers on endometrial disease data set. Figure on the left: performance of the regularized LSSVM <it>L</it><sub>&#8734; </sub>MKL with various <it>&#952;</it><sub><it>min </it></sub>values. Figure on the right: performance of the regularized SVM <it>L</it><sub>&#8734; </sub>MKL. The black dashed lines represent the performance of the <it>L</it><sub>2 </sub>MKL classifiers. The error bars are standard deviations of 20 repetitions.</p>
</text><graphic file="1471-2105-11-309-4"/></fig>
<fig id="F5"><title><p>Figure 5</p></title><caption><p>The effect of <it>&#952;</it><sub><it>min </it></sub>in LSSVM MKL and SVM MKL classifier on miscarriage data set</p></caption><text>
   <p><b>The effect of <it>&#952;</it><sub><it>min </it></sub>in LSSVM MKL and SVM MKL classifier on miscarriage data set</b>. The effect of <it>&#952;</it><sub><it>min </it></sub>in LSSVM MKL and SVM MKL classifiers on miscarriage data set. Figure on the left: performance of the regularized LSSVM <it>L</it><sub>&#8734; </sub>MKL with various <it>&#952;</it><sub><it>min </it></sub>values. Figure on the right: performance of the regularized SVM <it>L</it><sub>&#8734; </sub>MKL. The black dashed lines represent the performance of the <it>L</it><sub>2 </sub>MKL classifiers. The error bars are standard deviations of 20 repetitions.</p>
</text><graphic file="1471-2105-11-309-5"/></fig>
<fig id="F6"><title><p>Figure 6</p></title><caption><p>The effect of <it>&#952;</it><sub><it>min </it></sub>in weighted LSSVM MKL and weighted SVM MKL classifier on pregnancy data set</p></caption><text>
   <p><b>The effect of <it>&#952;</it><sub><it>min </it></sub>in weighted LSSVM MKL and weighted SVM MKL classifier on pregnancy data set</b>. The effect of <it>&#952;</it><sub><it>min </it></sub>in LSSVM MKL and SVM MKL classifiers on pregnancy data set. Figure on the left: performance of the regularized LSSVM <it>L</it><sub>&#8734; </sub>MKL with various <it>&#952;</it><sub><it>min </it></sub>values. Figure on the right: performance of the regularized SVM <it>L</it><sub>&#8734; </sub>MKL. The black dashed lines represent the performance of the <it>L</it><sub>2 </sub>MKL classifiers. The error bars are standard deviations of 20 repetitions.</p>
</text><graphic file="1471-2105-11-309-6"/></fig>
<p>To investigate whether the combination of multiple kernels performs as well as the best individual kernel, we evaluated the performance of all the individual kernels in section 5 of Additional file <supplr sid="S1">1</supplr>. As shown, the clinical kernel proposed by Daemen and De Moor <abbrgrp>
<abbr bid="B33">33</abbr>
</abbrgrp> has better quality than linear, RBF and polynomial kernels on endometrial and pregnancy data sets. For the miscarriage data set, the first RBF kernel has better quality than the other seven kernels. Despite the difference in individual kernels, the performance of MKL is comparable to the best individual kernel, demonstrating that MKL is also useful to combine candidate kernels derived from a single data set.</p>
<p>The effectiveness of MKL can also be justified by investigating the kernel coefficients optimized on all the data sets and classifiers. As shown in section 6 of Additional file <supplr sid="S1">1</supplr>, the kernel coefficients optimized by <it>L</it>
<sub>&#8734; </sub>MKL algorithms were sparse whereas the <it>L</it>
<sub>2 </sub>ones were more evenly assigned to different kernels. The best individual kernels of all data sets usually get dominant coefficient, explaining why the performance of MKL algorithms is comparable to the best individual kernels.</p>
<p>In this paper, the regularization parameter &#955; in LSSVM classifiers was jointly estimated in MKL. Since the clinical data sets contain a sufficient number of samples to select the &#955; by cross validation, we systematically compared the estimation approach with the standard validation approach to determine the &#955; values. As shown in Table <tblr tid="T11">11</tblr>, the estimation approach based on <it>L</it>
<sub>&#8734; </sub>performed worse than the validation approach. This is probably because the estimated &#955; values are either very big or very small when the kernel coefficients were sparse. In contrast, the <it>L</it>
<sub>2 </sub>based estimation approach yielded comparable performance as the validation approach. We also benchmarked the performance of LSSVM MKL classifiers using 21 different static &#955; values on the data sets and the results are shown in section 7 of Additional file <supplr sid="S1">1</supplr>. In real problems, to select the optimal &#955; value in LSSVM is a non-trivial issue and it is often optimized as a hyper-parameter on validation data. The main advantage of <it>L</it>
<sub>2 </sub>MKL is that the estimation approach is more computational efficient than cross validation and yields a comparable performance.</p>
<tbl id="T11"><title><p>Table 11</p></title><caption><p>Comparison of the performance obtained by joint estimation of &#955; and standard cross-validation in LSSVM MKL</p></caption><tblbdy cols="4">
      <r>
         <c ca="left">
            <p>
               <b>Data Set</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Norm</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Validation Approach</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Estimation Approach</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>endometrial disease</p>
         </c>
         <c ca="left">
            <p>
               <it>L</it>
               <sub>&#8734;</sub>
            </p>
         </c>
         <c ca="left">
            <p>0.2625 &#177; 0.0146</p>
         </c>
         <c ca="left">
            <p>0.2678 &#177; 0.0130</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p/>
         </c>
         <c ca="left">
            <p>
               <it>L</it>
               <sub>2</sub>
            </p>
         </c>
         <c ca="left">
            <p>0.2584 &#177; 0.0188</p>
         </c>
         <c ca="left">
            <p>0.2456 &#177; 0.0124</p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>miscarriage</p>
         </c>
         <c ca="left">
            <p>
               <it>L</it>
               <sub>&#8734;</sub>
            </p>
         </c>
         <c ca="left">
            <p>0.1873 &#177; 0.0100</p>
         </c>
         <c ca="left">
            <p>0.2319 &#177; 0.0015</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p/>
         </c>
         <c ca="left">
            <p>
               <it>L</it>
               <sub>2</sub>
            </p>
         </c>
         <c ca="left">
            <p>0.1912 &#177; 0.0089</p>
         </c>
         <c ca="left">
            <p>0.2002 &#177; 0.0049</p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>pregnancy</p>
         </c>
         <c ca="left">
            <p>
               <it>L</it>
               <sub>&#8734;</sub>
            </p>
         </c>
         <c ca="left">
            <p>0.1321 &#177; 0.0243</p>
         </c>
         <c ca="left">
            <p>0.1651 &#177; 0.0173</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p/>
         </c>
         <c ca="left">
            <p>
               <it>L</it>
               <sub>2</sub>
            </p>
         </c>
         <c ca="left">
            <p>0.1299 &#177; 0.0172</p>
         </c>
         <c ca="left">
            <p>0.1165 &#177; 0.0100</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Comparison of the performance obtained by joint estimation of &#955; and standard cross-validation using LSSVM MKL. As shown, the estimation approach based on <it>L</it><sub>2 </sub>MKL is better than <it>L</it><sub>&#8734; </sub>MKL. This is because when the kernel coefficients are sparse, the estimated regularization parameters &#955; are either very big or very small, which are usually not optimal values in LSSVM. In contrast, the &#955; values estimated by <it>L</it><sub>2 </sub>method are at normal scale and often close to the optimal values.</p>
   </tblfn></tbl>
</sec>
<sec>
<st>
<p>Experiment 5: Computational complexity and numerical experiments on large scale problems</p>
</st>
<sec>
<st>
<p>Overview of the convexity and complexity</p>
</st>
<p>We concluded the convexity and the time complexity of all proposed methods in Table <tblr tid="T12">12</tblr>. All problems proposed in this paper are convex or can be transformed to a convex formulation by relaxation. The LSSVM SIP formulation has the lowest time complexity thus it is more preferable for large scale problems.</p>
<tbl id="T12"><title><p>Table 12</p></title><caption><p>Convexity and complexity of all methods</p></caption><tblbdy cols="3">
      <r>
         <c ca="left">
            <p>
               <b>Method</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>convexity</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>complexity</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>1-SVM SOCP <it>L</it><sub>&#8734;</sub>, <it>L</it><sub>2</sub></p>
         </c>
         <c ca="left">
            <p>convex</p>
         </c>
         <c ca="left">
            <p><it>O</it>((<it>p </it>+ <it>n</it>)<sup>2</sup><it>n</it><sup>2.5</sup>)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>1-SVM QCQP <it>L</it><sub>&#8734;</sub></p>
         </c>
         <c ca="left">
            <p>convex</p>
         </c>
         <c ca="left">
            <p><it>O</it>(<it>pn</it><sup>3</sup>)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>SVM SOCP <it>L</it><sub>&#8734;</sub>, <it>L</it><sub>2</sub></p>
         </c>
         <c ca="left">
            <p>convex</p>
         </c>
         <c ca="left">
            <p><it>O</it>((<it>p </it>+ <it>n</it>)<sup>2</sup>(<it>k + n</it>)<sup>2.5</sup>)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>SVM QCQP <it>L</it><sub>&#8734;</sub></p>
         </c>
         <c ca="left">
            <p>convex</p>
         </c>
         <c ca="left">
            <p><it>O</it>(<it>pk</it><sup>2</sup><it>n</it><sup>2 </sup>+ <it>k</it><sup>3</sup><it>n</it><sup>3</sup>)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>SVM SIP <it>L</it><sub>&#8734;</sub></p>
         </c>
         <c ca="left">
            <p>convex</p>
         </c>
         <c ca="left">
            <p><it>O</it>(&#964;(<it>kn</it><sup>3 </sup>+ <it>p</it><sup>3</sup>))</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>SVM SIP <it>L</it><sub>2</sub></p>
         </c>
         <c ca="left">
            <p>relaxation</p>
         </c>
         <c ca="left">
            <p><it>O</it>(&#964;(<it>kn</it><sup>3 </sup>+ <it>p</it><sup>3</sup>))</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>LSSVM SOCP <it>L</it><sub>&#8734;</sub>, <it>L</it><sub>2</sub></p>
         </c>
         <c ca="left">
            <p>convex</p>
         </c>
         <c ca="left">
            <p><it>O</it>((<it>p </it>+ <it>n</it>)<sup>2</sup>(<it>k + n</it>)<sup>2.5</sup>)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>LSSVM QCQP <it>L</it><sub>&#8734;</sub>, <it>L</it><sub>2</sub></p>
         </c>
         <c ca="left">
            <p>convex</p>
         </c>
         <c ca="left">
            <p><it>O</it>(<it>pk</it><sup>2</sup><it>n</it><sup>2 </sup>+ <it>k</it><sup>3</sup><it>n</it><sup>3</sup>)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>LSSVM SIP <it>L</it><sub>&#8734;</sub></p>
         </c>
         <c ca="left">
            <p>convex</p>
         </c>
         <c ca="left">
            <p><it>O</it>(&#964;(<it>n</it><sup>2 </sup>+ <it>p</it><sup>3</sup>))</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>LSSVM SIP <it>L</it><sub>2</sub></p>
         </c>
         <c ca="left">
            <p>relaxation</p>
         </c>
         <c ca="left">
            <p><it>O</it>(&#964;(<it>n</it><sup>2 </sup>+ <it>p</it><sup>3</sup>))</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Convexity and complexity of all methods. <it>n </it>is the number of samples, <it>p </it>is the number of kernels, <it>k </it>is the number of classes, <it>&#964; </it>is the number of iterations in SIP. The complexity of LSSVM SIP depends on the algorithms used to solve the linear system. For the conjugate gradient method, the complexity is between <it>O</it>(<it>n</it><sup>1.5</sup>) and <it>O</it>(<it>n</it><sup>2</sup>) <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>.</p>
   </tblfn></tbl>
<p>We verified the efficiency in numerical experiments, which adopts two UCI digit recognition data sets (pen-digit and optical digit) to compare the computational time of the proposed algorithms.</p>
</sec>
<sec>
<st>
<p>QP formulation is more efficient than SOCP</p>
</st>
<p>We investigated the efficiency of various formulations to solve the 1-SVM MKL. As mentioned, the problems presented in (15) can be solved either as QCLP or as SOCP. We applied Sedumi <abbrgrp>
<abbr bid="B14">14</abbr>
</abbrgrp> to solve it as SOCP and MOSEK to solve it as QCLP and SOCP. We found that solving the QP by MOSEK was most efficient (142 seconds). In contrast, the MOSEK-SOCP method costed 2608 seconds and the Sedumi-SOCP method took 4500 seconds. This is probably because when transforming a QP to a SOCP, a large number of additional variables and constraints are involved, thus becoming more expensive to solve.</p>
</sec>
<sec>
<st>
<p>SIP formulation is more efficient than QCQP</p>
</st>
<p>To compare the computational time of solving MKL classifiers based on QP and SIP formulations, we scaled up the kernel fusion problem in three dimensions: the number of kernels, the number of classes and the number of samples. As shown in Figure <figr fid="F7">7</figr>, the SIP formulation of LSSVM MKL increases linearly with the number of samples and kernels, and is barely influenced by the number of classes. Solving the SIP based LSSVM MKL is significantly faster than solving SVM MKL because the former optimizes through iterations on a linear systems whereas the latter iterates over quadratic systems. For LSSVM MKL, the SIP formulation is also more preferable than the quadratic formulation. A quadratic system is a memory intensive problem and its complexity increases exponentially with the number of kernels and the number of samples in MKL. In contrast, the SIP formulation separates the problem into a series of linear systems, whose complexity is only determined by the number of samples and less affected by the number of kernels or classes. As shown in step 3 of Algorithm 5.2, the coefficient matrix of the linear system is a combined single kernel matrix and is constant with respect to multiple classes, thus it can be solved very efficiently. We have also compared the CPU time of <it>L</it>
<sub>&#8734; </sub>and <it>L</it>
<sub>2 </sub>LSSVM MKL on large data sets and their efficiency is very similar to each other.</p>
<fig id="F7"><title><p>Figure 7</p></title><caption><p>Comparison of QP formulation and SIP formulation on large scale data</p></caption><text>
   <p><b>Comparison of QP formulation and SIP formulation on large scale data</b>. Comparison of QP formulation and SIP formulation on large scale data. Figure on the top left: comparison of SOCP and QCQP formulations to solve 1-SVM MKL using two kernels. To simulate the ranking problem in 1-SVM, 3000 digit samples were retrieved as training data. Two kernels were constructed respectively for each data source using RBF kernel functions. The computational time was thus evaluated by combining the two 3000 &#215; 3000 kernel matrices. Figure on the top right: comparison of SVM and LSSVM MKL on problems with increasing number of samples. The benchmark data set was made up of two linear kernels and labels in 10 digit classes. The number of data points was increased from 1000 to 3000. Figure on the bottom left: comparison of SVM and LSSVM MKL on problems with increasing number of kernels. The benchmark data set was constructed by 2000 samples labeled in 2 classes. We used different kernel widths to construct the RBF kernel matrices and increase the number of kernel matrices from 2 to 200. The QCQP formulations had memory issues when the number of kernels was larger than 60. Figure on the bottom right: comparison of SVM and LSSVM on problems with increasing number of classes. The benchmark data was made up of two linear kernel matrices and 2000 samples. The samples were equally and randomly divided into various number of classes. The class number gradually increased from 2 to 20.</p>
</text><graphic file="1471-2105-11-309-7"/></fig>
</sec>
</sec>
</sec>
<sec>
<st>
<p>Discussion</p>
</st>
<p>In this paper we propose a new <it>L</it>
<sub>2 </sub>MKL framework as the complement to the existing <it>L</it>
<sub>&#8734; </sub>MKL method proposed by Lanckriet <it>et al</it>.. The <it>L</it>
<sub>2 </sub>MKL is characterized by the non-sparse integration of multiple kernels to optimize the objective function of machine learning problems. On four real bioinformatics and biomedical applications, we systematically validated the performance through extensive analysis. The motivation for <it>L</it>
<sub>2 </sub>MKL is as follows. In real biomedical applications, with a small number of sources that are believed to be truly informative, we would usually prefer a nonsparse set of coefficients because we would want to avoid that the dominant source (like text mining or Gene Ontology) gets a coefficient close to 1. The reason to avoid sparse coefficients is that there is a discrepancy between the experimental setup for performance evaluation and "real world" performance. The dominant source will work well on a benchmark because this is a controlled situation with known outcomes. We for example set up a set of already known genes for a given disease and want to demonstrate that our model can capture the available information to discriminate between a gene from this set and randomly selected genes (for example, in a cross-validation setup). Given that these genes are already known to be associated with the disease, this information will be present in sources like text mining or Gene Ontology in the gene prioritization problem. These sources can then identify these known genes with high confidence and should therefore be assigned a high weight. However, when trying to identify truly novel genes for the same disease, the relevance of the information available through such data sources will be much lower and we would like to avoid anyone data source to complete dominate the other. Given that setting up a benchmark requires knowledge of the association between a gene and a disease, this effect is hard to avoid. We can therefore expect that if we have a smoother solution that performs as well as the sparse solution on benchmark data, it is likely to perform better on real discoveries.</p>
<p>For the specific problem of gene prioritization, an effective way to address this problem is to setup a benchmark where information is "rolled back" a number of years (e.g., two years) prior to the discovery of the association between a gene and a disease (i.e., older information is used so that the information about the association between the gene and the disease is not yet contained in data sources like text mining or Gene Ontology). Given that the date at which the association was discovered is different for each gene, the setup of such benchmarks is notoriously difficult. In future work, we plan to address this problem by freezing available knowledge at a given data and then collecting novel discoveries and benchmarking against such discoveries in a fashion reminiscent of CASP (Critical Assessment of protein Structure Prediction) <abbrgrp>
<abbr bid="B39">39</abbr>
</abbrgrp>.</p>
<p>The technical merit of the proposed <it>L</it>
<sub>2 </sub>MKL lay in the dual form of the learning problems. Though in the literature the issue of using different norms in MKL is recently investigated by Kloft <it>et al</it>. <abbrgrp>
<abbr bid="B40">40</abbr>
<abbr bid="B9">9</abbr>
</abbrgrp> and Kowalski <it>et al</it>. <abbrgrp>
<abbr bid="B41">41</abbr>
</abbrgrp>, their formulations are based on the primal problems. In our paper, the notion of the proposed <it>L</it>
<sub>2 </sub>method is discussed in the dual space, which differs from regularizing the norm of coefficients term in the primal space. We have theoretically proven that optimizing the <it>L</it>
<sub>2 </sub>regularization of kernel coefficients in the primal problem corresponds to solving the <it>L</it>
<sub>2</sub>-norm of kernel components in the dual problem. Clarifying this dual solution enabled us to directly solve the <it>L</it>
<sub>2 </sub>problem as a convex SOCP. Moreover, the dual solution can be extended to various other machine learning problems. In this paper we have shown the extensions of 1-SVM, SVM and LSSVM. As a matter of fact, the <it>L</it>
<sub>2 </sub>dual solution can also be applied in kernel based clustering analysis and regression analysis for a wide range of applications. Another main contribution of our paper is the novel LSSVM <it>L</it>
<sub>2 </sub>MKL proposed for classification problems. As known, when applying various machine learning techniques to solve real computational biological problems, the performance may depend on the data set and the experimental settings. When the performance evaluations of various methods are comparable, but with one method showing significant computational efficiency over other methods, this would be a "solid" advantage of this method. In this paper, we have shown that the LSSVM MKL classifier based on SIP formulation can be solved more efficiently than SVM MKL. Moreover, the performance of LSSVM <it>L</it>
<sub>2 </sub>MKL is always comparable to the best performance. The SIP based LSSVM <it>L</it>
<sub>2 </sub>MKL classifier has two main "solid advantages": the inherent time complexity is small and the regularization parameter &#955; can be jointly estimated in the experimental setup. Due to these merits, LSSVM <it>L</it>
<sub>2 </sub>MKL is a very promising technique for problems pertaining to large scale data fusion.</p>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>This paper compared the effect of optimizing different norms in multiple kernel learning in a systematic framework. The obtained results extend and enrich the statistical framework of genomic data fusion proposed by Lanckriet <it>et al</it>. <abbrgrp>
<abbr bid="B4">4</abbr>
<abbr bid="B6">6</abbr>
</abbrgrp> and Bach <it>et al</it>. <abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp>. According to the optimization of different norms in the dual problem of SVM, we proposed <it>L</it>
<sub>&#8734;</sub>, <it>L</it>
<sub>1</sub>, and <it>L</it>
<sub>2 </sub>MKL, which are respectively corresponding to the <it>L</it>
<sub>1 </sub>regularization, average combination, and <it>L</it>
<sub>2 </sub>regularization of kernel coefficients addressed in the primal problem.</p>
<p>Six real biomedical data sets were investigated in this paper, where <it>L</it>
<sub>2 </sub>MKL approach was shown advantageous over the <it>L</it>
<sub>&#8734; </sub>method. We also proposed a novel and efficient LSSVM <it>L</it>
<sub>2 </sub>MKL classifier to learn the optimal combination of multiple large scale data sets. All the algorithms implemented in this paper are freely accessible on <url>http://homes.esat.kuleuven.be/~sistawww/bioi/syu/l2lssvm.html</url>.</p>
</sec>
<sec>
<st>
<p>Authors' contributions</p>
</st>
<p>All authors conceived the project and design. SY performed the theoretical analysis, programmed the algorithms, analyzed the data and wrote the paper. TF investigated SIP and implemented SIP formulations for SVM and LSSVM. AD preprocessed the rectal cancer, endometrial, miscarriage and pregnancy data sets. AD also provided the code of clinical kernel construction. LCT provided the data sources, disease relevant benchmark genes and prostate cancer genes for gene prioritization application. LCT also compared the performance of prioritization on Endeavour system. JS is the promoter of TF. BDM is the promoter of AD and SY. YM is the promoter of SY and LCT. All authors read and approved the manuscript. AD is research assistant of the Fund for Scientific Research - Flanders (FWO-Vlaanderen) JS and YM are professor and BDM a full professor at the Katholieke Universiteit Leuven, Belgium. All authors read and approved the manuscript.</p>
</sec>
<sec>
<st>
<p>Appendix</p>
</st>
<p>
<b>Algorithm 0.1</b>: SIP-SVM-MKL(<it>K</it>
<sub>
<it>j</it>
</sub>, <it>Y</it>
<sub>
<it>q</it>
</sub>, <it>C</it>, <it>&#949;</it>)</p>
<p>Obtain the initial guess <inline-formula>
<graphic file="1471-2105-11-309-i118.gif"/>
</inline-formula>
</p>
<p>
<b>while </b>(&#916;<it>u </it>&gt; <it>&#949;</it>)</p>
<p>
<inline-formula>
<graphic file="1471-2105-11-309-i119.gif"/>
</inline-formula>
</p>
<p>
<b>comment: </b>
<it>&#964; </it>is the indicator of the current loop</p>
<p>
<b>return </b>
<inline-formula>
<graphic file="1471-2105-11-309-i120.gif"/>
</inline-formula>
</p>
<p>
<b>Algorithm 0.2</b>: SIP-LSSVM-MKL(<it>K</it>
<sub>
<it>j</it>
</sub>, <it>Y</it>
<sub>
<it>q</it>
</sub>, <it>&#949;</it>)</p>
<p>Obtain the initial guess <inline-formula>
<graphic file="1471-2105-11-309-i121.gif"/>
</inline-formula>
</p>
<p>
<b>while </b>(<b>&#916;</b>
<it>u </it>&gt; <it>&#949;</it>)</p>
<p>
<inline-formula>
<graphic file="1471-2105-11-309-i122.gif"/>
</inline-formula>
</p>
<p>
<b>comment</b>: <it>&#964; </it>is the indicator of the current loop</p>
<p>
<b>return </b>
<inline-formula>
<graphic file="1471-2105-11-309-i123.gif"/>
</inline-formula>
</p>
</sec>
</bdy><bm>
<ack>
<sec>
<st>
<p>Acknowledgements</p>
</st>
<p>The work was supported by Research Council KUL: GOA AMBioRICS, CoE EF/ 05/007 SymBioSys, PROMETA, several PhD/postdoc and Fellow Grants; FWO: PhD/postdoc Grants, Projects G.0241.04(Functional Genomics), G.0499.04(Statistics), G.0232.05(Cardiovascular), G.0318.05(subfunctionalization), G.0553.06(VitamineD), G.0302.07(SVM/Kernel), research communities(ICCoS, ANMMM, MLDM); IWT: PhD Grants, GBOU-McKnow-E(Knowledge management algorithms), GBOU-ANA(biosensors), TADBioScope-IT, Silicos; SBO-BioFrame, SBO-MoKa, TBMEndometriosis, TBM-IOTA3, O&amp;O-Dsquare; Belgian Federal Science Policy Office: IUAP P6/25(BioMaGNet, Bioinformatics and Modeling: from Genomes to Networks, 2007-2011); EURTD: ERNSI: European Research Network on System Identification; FP6-NoE Biopattern; FP6-IP e-Tumours, FP6-MC-EST Bioptrain, FP6-STREP Strokemap.</p>
</sec>
</ack>
<refgrp><bibl id="B1"><title><p>Methods of genomic data fusion: An overview</p></title><aug><au><snm>Tretyakov</snm><fnm>K</fnm></au></aug><pubdate>2006</pubdate><url>http://ats.cs.ut.ee/u/kt/hw/fusion/fusion.pdf</url></bibl><bibl id="B2"><title><p>The Nature of Statistical Learning Theory</p></title><aug><au><snm>Vapnik</snm><fnm>V</fnm></au></aug><publisher>Springer-Verlag, New York</publisher><pubdate>1995</pubdate></bibl><bibl id="B3"><title><p>Kernel methods for pattern analysis</p></title><aug><au><snm>Shawe-Taylor</snm><fnm>J</fnm></au><au><snm>Cristianini</snm><fnm>N</fnm></au></aug><publisher>Cambridge: Cambridge University Press</publisher><pubdate>2004</pubdate></bibl><bibl id="B4"><title><p>Learning the Kernel Matrix with Semidefinite Programming</p></title><aug><au><snm>Lanckriet</snm><fnm>GRG</fnm></au><au><snm>Cristianini</snm><fnm>N</fnm></au><au><snm>Bartlett</snm><fnm>P</fnm></au><au><snm>Ghaoui</snm><fnm>LE</fnm></au><au><snm>Jordan</snm><fnm>MI</fnm></au></aug><source>Journal of Machine Learning Research</source><pubdate>2005</pubdate><volume>5</volume><fpage>27</fpage><lpage>72</lpage></bibl><bibl id="B5"><title><p>Multiple kernel learning, conic duality, and the SMO algorithm</p></title><aug><au><snm>Bach</snm><fnm>FR</fnm></au><au><snm>Lanckriet</snm><fnm>GRG</fnm></au><au><snm>Jordan</snm><fnm>MI</fnm></au></aug><source>Proceedings of 21st International Conference of Machine Learning</source><pubdate>2004</pubdate></bibl><bibl id="B6"><title><p>A statistical framework for genomic data fusion</p></title><aug><au><snm>Lanckriet</snm><fnm>GRG</fnm></au><au><snm>De Bie</snm><fnm>T</fnm></au><au><snm>Cristianini</snm><fnm>N</fnm></au><au><snm>Jordan</snm><fnm>MI</fnm></au><au><snm>Noble</snm><fnm>WS</fnm></au></aug><source>Bioinformatics</source><pubdate>2004</pubdate><volume>20</volume><fpage>2626</fpage><lpage>2635</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/bth294</pubid><pubid idtype="pmpid" link="fulltext">15130933</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>Kernel-based data fusion for gene prioritization</p></title><aug><au><snm>De Bie</snm><fnm>T</fnm></au><au><snm>Tranchevent</snm><fnm>LC</fnm></au><au><snm>Van Oeffelen</snm><fnm>L</fnm></au><au><snm>Moreau</snm><fnm>Y</fnm></au></aug><source>Bioinformatics</source><pubdate>2007</pubdate><volume>23</volume><fpage>i125</fpage><lpage>i132</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btm187</pubid><pubid idtype="pmpid" link="fulltext">17646288</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>Feature selection, L1 vs. L2 regularization, and rotational invariance</p></title><aug><au><snm>Ng</snm><fnm>AY</fnm></au></aug><source>Proceedings of 21st International Conference of Machine Learning</source><pubdate>2004</pubdate></bibl><bibl id="B9"><title><p>Efficient and Accurate Lp-norm Multiple Kernel Learning</p></title><aug><au><snm>Kloft</snm><fnm>M</fnm></au><au><snm>Brefeld</snm><fnm>U</fnm></au><au><snm>Sonnenburg</snm><fnm>S</fnm></au><au><snm>Laskov</snm><fnm>P</fnm></au><au><snm>M&#252;ller</snm><fnm>K</fnm></au><au><snm>Zien</snm><fnm>A</fnm></au></aug><source>Advances in Neural Information Processing Systems 22</source><pubdate>2009</pubdate></bibl><bibl id="B10"><title><p>CVX: Matlab Software for Disciplined Convex Programming, version 1.21</p></title><aug><au><snm>Grant</snm><fnm>M</fnm></au><au><snm>Boyd</snm><fnm>S</fnm></au></aug><pubdate>2010</pubdate><url>http://cvxr.com/cvx</url></bibl><bibl id="B11"><title><p>Graph implementations for nonsmooth convex programs</p></title><aug><au><snm>Grant</snm><fnm>M</fnm></au><au><snm>Boyd</snm><fnm>S</fnm></au></aug><source>Recent Advances in Learning and Control Lecture Notes in Control and Information Sciences</source><publisher>Springer-Verlag Limited</publisher><editor>Blondel V, Boyd S, Kimura H</editor><pubdate>2008</pubdate><fpage>95</fpage><lpage>110</lpage><url>http://stanford.edu/~boyd/graph_dcp.html</url><xrefbib><pubid idtype="doi">full_text</pubid></xrefbib></bibl><bibl id="B12"><title><p>Support vector domain description</p></title><aug><au><snm>Tax</snm><fnm>DMJ</fnm></au><au><snm>Duin</snm><fnm>RPW</fnm></au></aug><source>Pattern Recognition Letter</source><pubdate>1999</pubdate><volume>20</volume><fpage>1191</fpage><lpage>1199</lpage><xrefbib><pubid idtype="doi">10.1016/S0167-8655(99)00087-2</pubid></xrefbib></bibl><bibl id="B13"><title><p>Estimating the support of a high-dimensional distribution</p></title><aug><au><snm>Sch&#246;lkopf</snm><fnm>B</fnm></au><au><snm>Platt</snm><fnm>JC</fnm></au><au><snm>Shawe-Taylor</snm><fnm>J</fnm></au><au><snm>Smola</snm><fnm>AJ</fnm></au><au><snm>Williamson</snm><fnm>RC</fnm></au></aug><source>Neural Computation</source><pubdate>2001</pubdate><volume>13</volume><fpage>1443</fpage><lpage>1471</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1162/089976601750264965</pubid><pubid idtype="pmpid" link="fulltext">11440593</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><aug><au><cnm>Sedumi</cnm></au></aug><url>http://sedumi.ie.lehigh.edu/</url></bibl><bibl id="B15"><title><p>The MOSEK interior point optimizer for linear programming: an implementation of the homogeneous algorithm</p></title><aug><au><snm>Andersen</snm><fnm>ED</fnm></au><au><snm>Andersen</snm><fnm>KD</fnm></au></aug><source>High Perf Optimization</source><pubdate>2000</pubdate><fpage>197</fpage><lpage>232</lpage></bibl><bibl id="B16"><title><p>Optimal kernel selection in kernel fisher discriminant analysis</p></title><aug><au><snm>Kim</snm><fnm>SJ</fnm></au><au><snm>Magnani</snm><fnm>A</fnm></au><au><snm>Boyd</snm><fnm>S</fnm></au></aug><source>Proceeding of 23rd International Conference of Machine Learning</source><pubdate>2006</pubdate></bibl><bibl id="B17"><title><p>Multi-class discriminant kernel learning via convex programming</p></title><aug><au><snm>Ye</snm><fnm>JP</fnm></au><au><snm>Ji</snm><fnm>SH</fnm></au><au><snm>Chen</snm><fnm>JH</fnm></au></aug><source>Journal of Machine Learning Research</source><pubdate>2008</pubdate><volume>40</volume><fpage>719</fpage><lpage>758</lpage></bibl><bibl id="B18"><title><p>Large scale multiple kernel learning</p></title><aug><au><snm>Sonnenburg</snm><fnm>S</fnm></au><au><snm>R&#228;tsch</snm><fnm>G</fnm></au><au><snm>Sch&#228;fer</snm><fnm>C</fnm></au><au><snm>Sch&#246;lkopf</snm><fnm>B</fnm></au></aug><source>Journal of Machine Learning Research</source><pubdate>2006</pubdate><volume>7</volume><fpage>1531</fpage><lpage>1565</lpage></bibl><bibl id="B19"><title><p>Semi-infinite programming: theory, methods, and applications</p></title><aug><au><snm>Hettich</snm><fnm>R</fnm></au><au><snm>Kortanek</snm><fnm>KO</fnm></au></aug><source>SIAM Review</source><pubdate>1993</pubdate><volume>35</volume><issue>3</issue><fpage>380</fpage><lpage>429</lpage><xrefbib><pubid idtype="doi">10.1137/1035089</pubid></xrefbib></bibl><bibl id="B20"><title><p>Logarithmic barrier decomposition methods for semi-infinite programming</p></title><aug><au><snm>Kaliski</snm><fnm>J</fnm></au><au><snm>Haglin</snm><fnm>D</fnm></au><au><snm>Roos</snm><fnm>C</fnm></au><au><snm>Terlaky</snm><fnm>T</fnm></au></aug><source>International Transactions in Operations Research</source><volume>4</volume><issue>4</issue></bibl><bibl id="B21"><title><p>Some other approximation methods for semi-infinite optimization problems</p></title><aug><au><snm>Reemtsen</snm><fnm>R</fnm></au></aug><source>Jounral of Computational and Applied Mathematics</source><pubdate>1994</pubdate><volume>53</volume><fpage>87</fpage><lpage>108</lpage><xrefbib><pubid idtype="doi">10.1016/0377-0427(92)00122-P</pubid></xrefbib></bibl><bibl id="B22"><title><p>Least Squares Support Vector Machines</p></title><aug><au><snm>Suykens</snm><fnm>JAK</fnm></au><au><snm>Van Gestel</snm><fnm>T</fnm></au><au><snm>Brabanter</snm><fnm>J</fnm></au><au><snm>De Moor</snm><fnm>B</fnm></au><au><snm>Vandewalle</snm><fnm>J</fnm></au></aug><publisher>World Scientific Publishing, Singapore</publisher><pubdate>2002</pubdate></bibl><bibl id="B23"><title><p>Controlling the sensitivity of support vector machines</p></title><aug><au><snm>Veropoulos</snm><fnm>K</fnm></au><au><snm>N</snm><fnm>C</fnm></au><au><snm>C</snm><fnm>C</fnm></au></aug><source>Proc of the IJCAI 99</source><pubdate>1999</pubdate><fpage>55</fpage><lpage>60</lpage></bibl><bibl id="B24"><title><p>Reduction of False Positives in Polyp Detection Using Weighted Support Vector Machines</p></title><aug><au><snm>Zheng</snm><fnm>Y</fnm></au><au><snm>Yang</snm><fnm>X</fnm></au><au><snm>Beddoe</snm><fnm>G</fnm></au></aug><source>Proc. of the 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)</source><pubdate>2007</pubdate><fpage>4433</fpage><lpage>4436</lpage><xrefbib><pubid idtype="doi">full_text</pubid></xrefbib></bibl><bibl id="B25"><title><p>Weighted least squares support vector machines : robustness and sparse approximation</p></title><aug><au><snm>Suykens</snm><fnm>JAK</fnm></au><au><snm>De Brabanter</snm><fnm>J</fnm></au><au><snm>Lukas</snm><fnm>L</fnm></au><au><snm>Vandewalle</snm><fnm>J</fnm></au></aug><source>Neurocomputing, Special issue on fundamental and information processing aspects of neurocomputing</source><pubdate>2002</pubdate><volume>48</volume><issue>1-4</issue><fpage>85</fpage><lpage>105</lpage></bibl><bibl id="B26"><title><p>Leave-One-Out Cross-Validation Based Model Selection Criteria for Weighted LS-SVMs</p></title><aug><au><snm>Cawley</snm><fnm>GC</fnm></au></aug><source>Proc. of 2006 International Joint Conference on Neural Networks</source><pubdate>2006</pubdate><fpage>1661</fpage><lpage>1668</lpage><xrefbib><pubid idtype="doi">full_text</pubid></xrefbib></bibl><bibl id="B27"><title><p>Gene prioritization through genomic data fusion</p></title><aug><au><snm>Aerts</snm><fnm>S</fnm></au><au><snm>Lambrechts</snm><fnm>D</fnm></au><au><snm>Maity</snm><fnm>S</fnm></au><au><snm>Van Loo</snm><fnm>P</fnm></au><au><snm>Coessens</snm><fnm>B</fnm></au><au><snm>De Smet</snm><fnm>F</fnm></au><au><snm>Tranchevent</snm><fnm>LC</fnm></au><au><snm>De Moor</snm><fnm>B</fnm></au><au><snm>Marynen</snm><fnm>P</fnm></au><au><snm>Hassan</snm><fnm>B</fnm></au><au><snm>Carmeliet</snm><fnm>P</fnm></au><au><snm>Moreau</snm><fnm>Y</fnm></au></aug><source>Nature Biotechnology</source><pubdate>2006</pubdate><volume>24</volume><fpage>537</fpage><lpage>544</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nbt1203</pubid><pubid idtype="pmpid" link="fulltext">16680138</pubid></pubidlist></xrefbib></bibl><bibl id="B28"><title><p>Comparison of vocabularies, representations and ranking algorithms for gene prioritization by text mining</p></title><aug><au><snm>Yu</snm><fnm>S</fnm></au><au><snm>Van Vooren</snm><fnm>S</fnm></au><au><snm>Tranchevent</snm><fnm>LC</fnm></au><au><snm>De Moor</snm><fnm>B</fnm></au><au><snm>Moreau</snm><fnm>Y</fnm></au></aug><source>Bioinformatics</source><pubdate>2008</pubdate><volume>24</volume><fpage>i119</fpage><lpage>i125</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btn291</pubid><pubid idtype="pmpid" link="fulltext">18689812</pubid></pubidlist></xrefbib></bibl><bibl id="B29"><title><p>The spectrum kernel: a string kernel for SVM protein classification</p></title><aug><au><snm>Leslie</snm><fnm>C</fnm></au><au><snm>Eskin</snm><fnm>E</fnm></au><au><snm>Weston</snm><fnm>J</fnm></au><au><snm>Noble</snm><fnm>WS</fnm></au></aug><source>Proc. of the Pacific Symposium on Biocomputing 2002</source><pubdate>2002</pubdate></bibl><bibl id="B30"><title><p>Multiple newly identified loci associated with prostate cancer susceptibility</p></title><aug><au><snm>Eeles</snm><fnm>RA</fnm></au><au><snm>Kote-Jarai</snm><fnm>Z</fnm></au><au><snm>Giles</snm><fnm>GG</fnm></au><au><snm>Olama</snm><fnm>AAA</fnm></au><au><snm>Guy</snm><fnm>M</fnm></au><au><snm>Jugurnauth</snm><fnm>SK</fnm></au><au><snm>Mulholland</snm><fnm>S</fnm></au><au><snm>Leongamornlert</snm><fnm>DA</fnm></au><au><snm>Edwards</snm><fnm>SM</fnm></au><au><snm>Morrison</snm><fnm>Jea</fnm></au></aug><source>Nat Genet</source><pubdate>2008</pubdate><volume>40</volume><fpage>316</fpage><lpage>321</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng.90</pubid><pubid idtype="pmpid" link="fulltext">18264097</pubid></pubidlist></xrefbib></bibl><bibl id="B31"><title><p>Multiple loci identified in a genome-wide association study of prostate cancer</p></title><aug><au><snm>Thomas</snm><fnm>G</fnm></au><au><snm>Jacobs</snm><fnm>KB</fnm></au><au><snm>Yeager</snm><fnm>M</fnm></au><au><snm>Kraft</snm><fnm>P</fnm></au><au><snm>Wacholder</snm><fnm>S</fnm></au><au><snm>Orr</snm><fnm>N</fnm></au><au><snm>Yu</snm><fnm>K</fnm></au><au><snm>Chatterjee</snm><fnm>N</fnm></au><au><snm>Welch</snm><fnm>R</fnm></au><au><snm>Hutchinson</snm><fnm>Aea</fnm></au></aug><source>Nat Genet</source><pubdate>2008</pubdate><volume>40</volume><fpage>310</fpage><lpage>315</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng.91</pubid><pubid idtype="pmpid" link="fulltext">18264096</pubid></pubidlist></xrefbib></bibl><bibl id="B32"><title><p>Common sequence variants on 2p15 and Xp11.22 confer susceptibility to prostate cancer</p></title><aug><au><snm>Gudmundsson</snm><fnm>J</fnm></au><au><snm>Sulem</snm><fnm>P</fnm></au><au><snm>Rafnar</snm><fnm>T</fnm></au><au><snm>Bergthorsson</snm><fnm>JT</fnm></au><au><snm>Manolescu</snm><fnm>A</fnm></au><au><snm>Gudbjartsson</snm><fnm>D</fnm></au><au><snm>Agnarsson</snm><fnm>BA</fnm></au><au><snm>Sigurdsson</snm><fnm>A</fnm></au><au><snm>Benediktsdottir</snm><fnm>KR</fnm></au><au><snm>Blondal</snm><fnm>Tea</fnm></au></aug><source>Nat Genet</source><pubdate>2008</pubdate><volume>40</volume><fpage>281</fpage><lpage>283</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng.89</pubid><pubid idtype="pmpid" link="fulltext">18264098</pubid></pubidlist></xrefbib></bibl><bibl id="B33"><title><p>A kernel-based integration of genome-wide data for clinical decision support</p></title><aug><au><snm>Daemen</snm><fnm>A</fnm></au><au><snm>Gevaert</snm><fnm>O</fnm></au><au><snm>Ojeda</snm><fnm>F</fnm></au><au><snm>Debucquoy</snm><fnm>A</fnm></au><au><snm>Suykens</snm><fnm>JAK</fnm></au><au><snm>Sempous</snm><fnm>C</fnm></au><au><snm>Machiels</snm><fnm>JP</fnm></au><au><snm>Haustermans</snm><fnm>K</fnm></au><au><snm>De Moor</snm><fnm>B</fnm></au></aug><source>Genome Medicine</source><pubdate>2009</pubdate><volume>1</volume><fpage>39</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gm39</pubid><pubid idtype="pmcid">2684660</pubid><pubid idtype="pmpid">19356222</pubid></pubidlist></xrefbib></bibl><bibl id="B34"><title><p>Development of a kernel function for clinical data</p></title><aug><au><snm>Daemen</snm><fnm>A</fnm></au><au><snm>De Moor</snm><fnm>B</fnm></au></aug><source>Proc. of the 31th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)</source><pubdate>2009</pubdate><fpage>5913</fpage><lpage>5917</lpage></bibl><bibl id="B35"><title><p>Mathematical decision trees versus clinician based algorithms in the diagnosis of endometrial disease</p></title><aug><au><snm>van den Bosch</snm><fnm>T</fnm></au><au><snm>Daemen</snm><fnm>A</fnm></au><au><snm>Gevaert</snm><fnm>O</fnm></au><au><snm>Timmerman</snm><fnm>D</fnm></au></aug><source>Proc. of the 17th World Congress on Ultrasound in Obstetrics and Gynecology (ISUOG)</source><pubdate>2007</pubdate><fpage>412</fpage></bibl><bibl id="B36"><title><p>Functional linear discriminant analysis: a new longitudinal approach to the assessment of embryonic growth</p></title><aug><au><snm>Bottomley</snm><fnm>C</fnm></au><au><snm>Daemen</snm><fnm>A</fnm></au><au><snm>Mukri</snm><fnm>F</fnm></au><au><snm>Papageorghiou</snm><fnm>AT</fnm></au><au><snm>Kirk</snm><fnm>E</fnm></au><au><snm>A</snm><fnm>P</fnm></au><au><snm>De Moor</snm><fnm>B</fnm></au><au><snm>Timmerman</snm><fnm>D</fnm></au><au><snm>Bourne</snm><fnm>T</fnm></au></aug><source>Human Reproduction</source><pubdate>2007</pubdate><volume>24</volume><issue>2</issue><fpage>278</fpage><lpage>283</lpage><xrefbib><pubid idtype="doi">10.1093/humrep/den382</pubid></xrefbib></bibl><bibl id="B37"><title><p>Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks</p></title><aug><au><snm>Gevaert</snm><fnm>O</fnm></au><au><snm>De Smet</snm><fnm>F</fnm></au><au><snm>Timmerman</snm><fnm>D</fnm></au><au><snm>Moreau</snm><fnm>Y</fnm></au><au><snm>De Moor</snm><fnm>B</fnm></au></aug><source>Bioinformatics</source><pubdate>2006</pubdate><volume>22</volume><issue>14</issue><fpage>e184</fpage><lpage>e190</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btl230</pubid><pubid idtype="pmpid" link="fulltext">16873470</pubid></pubidlist></xrefbib></bibl><bibl id="B38"><title><p>The use of a new logistic regression model for predicting the outcome of pregnancies of unknown location</p></title><aug><au><snm>Condous</snm><fnm>G</fnm></au><au><snm>Okaro</snm><fnm>E</fnm></au><au><snm>Khalid</snm><fnm>A</fnm></au><au><snm>Timmerman</snm><fnm>D</fnm></au><au><snm>Lu</snm><fnm>C</fnm></au><au><snm>Zhou</snm><fnm>Y</fnm></au><au><snm>Van Huffel</snm><fnm>S</fnm></au><au><snm>Bourne</snm><fnm>T</fnm></au></aug><source>Human Reproduction</source><pubdate>2004</pubdate><volume>21</volume><fpage>278</fpage><lpage>283</lpage></bibl><bibl id="B39"><title><p>Critical assessment of methods of protein structure prediction - Round VIII</p></title><aug><au><snm>Moult</snm><fnm>J</fnm></au><au><snm>Fidelis</snm><fnm>K</fnm></au><au><snm>Kryshtafovych</snm><fnm>A</fnm></au><au><snm>Rost</snm><fnm>B</fnm></au><au><snm>Tramontano</snm><fnm>A</fnm></au></aug><source>Proteins: Structure, Function, and Bioinformatics</source><volume>77</volume><issue>S9</issue></bibl><bibl id="B40"><title><p>Non-sparse multiple kernel learning</p></title><aug><au><snm>Kloft</snm><fnm>M</fnm></au><au><snm>Brefeld</snm><fnm>U</fnm></au><au><snm>Laskov</snm><fnm>P</fnm></au><au><snm>Sonnenburg</snm><fnm>S</fnm></au></aug><source>NIPS 08 workshop: kernel learning automatic selection of optimal kernels</source><pubdate>2008</pubdate></bibl><bibl id="B41"><title><p>Multiple indefinite kernel learning with mixed norm regularization</p></title><aug><au><snm>Kowalski</snm><fnm>M</fnm></au><au><snm>Szafranski</snm><fnm>M</fnm></au><au><snm>Ralaivola</snm><fnm>L</fnm></au></aug><source>Proc of the 26th International Conference of Machine Learning</source><pubdate>2009</pubdate></bibl></refgrp>
</bm></art>