<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-7-S5-S14</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Proceedings</dochead>
      <bibl>
         <title>
            <p>SVM-based prediction of caspase substrate cleavage sites</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Wee</snm>
               <mi>JK</mi>
               <fnm>Lawrence</fnm>
               <insr iid="I1"/>
               <email>lawrence@bic.nus.edu.sg</email>
            </au>
            <au id="A2">
               <snm>Tan</snm>
               <mnm>Wee</mnm>
               <fnm>Tin</fnm>
               <insr iid="I1"/>
               <email>tinwee@bic.nus.edu.sg</email>
            </au>
            <au id="A3" ca="yes">
               <snm>Ranganathan</snm>
               <fnm>Shoba</fnm>
               <insr iid="I2"/>
               <insr iid="I1"/>
               <email>shoba@els.mq.edu.au</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore</p>
            </ins>
            <ins id="I2">
               <p>Department of Chemistry and Biomolecular Sciences &amp; Biotechnology Research Institute, Macquarie University, Sydney, Australia</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <supplement>
            <title>
               <p>APBioNet &#8211; Fifth International Conference on Bioinformatics (InCoB2006)</p>
            </title>
            <editor>Shoba Ranganathan, Martti Tammi, Michael Gribskov, Tin Wee Tan</editor>
            <note>Proceedings</note>
         </supplement>
         <conference>
            <title>
               <p>International Conference in Bioinformatics &#8211; InCoB2006</p>
            </title>
            <location>New Delhi, India</location>
            <date-range>18&#8211;20 December 2006</date-range>
         </conference>
         <issn>1471-2105</issn>
         <pubdate>2006</pubdate>
         <volume>7</volume>
         <issue>Suppl 5</issue>
         <fpage>S14</fpage>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17254298</pubid>
               <pubid idtype="doi">10.1186/1471-2105-7-S5-S14</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <pub>
            <date>
               <day>18</day>
               <month>12</month>
               <year>2006</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2006</year>
         <collab>Wee et al; licensee BioMed Central Ltd</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Caspases belong to a class of cysteine proteases which function as critical effectors in apoptosis and inflammation by cleaving substrates immediately after unique sites. Prediction of such cleavage sites will complement structural and functional studies on substrates cleavage as well as discovery of new substrates. Recently, different computational methods have been developed to predict the cleavage sites of caspase substrates with varying degrees of success. As the support vector machines (SVM) algorithm has been shown to be useful in several biological classification problems, we have implemented an SVM-based method to investigate its applicability to this domain.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>A set of unique caspase substrates cleavage sites were obtained from literature and used for evaluating the SVM method. Datasets containing (i) the tetrapeptide cleavage sites, (ii) the tetrapeptide cleavage sites, augmented by two adjacent residues, P<sub>1</sub>' and P<sub>2</sub>' amino acids and (iii) the tetrapeptide cleavage sites with ten additional upstream and downstream flanking sequences (where available) were tested. The SVM method achieved an accuracy ranging from 81.25% to 97.92% on independent test sets. The SVM method successfully predicted the cleavage of a novel caspase substrate and its mutants.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>This study presents an SVM approach for predicting caspase substrate cleavage sites based on the cleavage sites and the downstream and upstream flanking sequences. The method shows an improvement over existing methods and may be useful for predicting hitherto undiscovered cleavage sites.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Caspases belong to a unique class of cysteine proteases which function as critical effectors of apoptosis, inflammation and other important cellular processes such as cell proliferation, cell differentiation, cell migration and receptor internalization <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>. Caspases contain a cysteine residue at the active site and cleave substrates at specific tetrapeptide sites (denoted P<sub>4</sub>-P<sub>3</sub>-P<sub>2</sub>-P<sub>1</sub>) with a highly conserved aspartate (D) at the P<sub>1 </sub>position <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. To date at least 14 mammalian caspases have been discovered and they can be grouped into three classes based on their preferential tetrapeptide specificities <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Group I caspases (-1, -4 and -5) recognize the sequence (W/L)EHD; Group II caspases (-2, -3 and -7) prefer the sequence DEXD; while Group III caspases (-6, -8, -9 and -10) cleave proteins with the sequence (L/V)E(T/H)D.</p>
         <p>As reviewed in Earnshaw <it>et al</it>. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> and Fischer <it>et al</it>. <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, substrates of caspases belong to a myriad of protein classes such as structural elements of cytoplasm and nucleus, components of the DNA repair machinery, protein kinases, GTPases and viral structural proteins. Although more than 280 caspase substrates have been discovered to date, it is possible that several more remain undetected <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. The identification and characterization of caspase substrates are critical for deepening our understanding of the role of these enzymes in the various cellular pathways. However, the accurate detection of caspase cleavage sites in target proteins requires complex and time consuming <it>in vivo </it>and <it>in vitro </it>experiments. Given the readily available sequence data in public databases, a useful alternative is to conduct <it>in silico </it>screening for potential cleavage sites among proteins. While the preferential cleavage specificities may be useful here, recently identified substrates have shown significant variation in their cleavage sites <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. Therefore, the development of computational tools to accurately capture complex sequence patterns and to automate the identification of new cleavage sites would be valuable.</p>
         <p>A number of caspase substrate cleavage prediction methods currently exist. The pioneering work began with PeptideCutter, a proteases substrates cleavage prediction server for various families of proteases. Due to the scarcity of experimental data, PeptideCutter was based only on the preferential cleavage specificities of certain caspases <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. Lohmuller <it>et al</it>. <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> developed the peptidase substrate prediction tool (PEPS) based on position specific scoring matrices (PSSM) for cathepsin B, cathepsin L and caspase-3 substrates. While useful, the utility of these tools is limited as they were built on a small dataset of cleavage sites and the cleavage specificities are confined to certain caspases alone, rather than the entire family. In recent years, the exponential discovery and characterization of new substrates and their cleavage sites <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> enabled the development of more effective algorithmic tools. Garay-Malpartida <it>et al</it>. <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> developed the CasPredictor software which exhibited an improvement over previous methods with an accuracy of 81% on a dataset of 137 experimentally verified cleavage sites. The CasPredictor software uses an algorithm which analyzes the cleavage sites for amino acid substitution, amino acid frequency and the presence of 'PEST' sequences <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp> in the vicinity of the cleavage site (flanking 10&#8211;15 residues). The GraBCas software by Backes <it>et al</it>. <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> advanced the previous PSSM-based methods by including an updated set of caspase cleavage specificities based on the work by Thornberry <it>et al</it>. <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, and observing conservation at P<sub>1</sub>' and even P<sub>2</sub>' positions. Yang <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> experimented with different neural networks for predicting cleavage sites such as single-layer perceptrons, multi-layer perceptrons and the Bayesian bio-basis function neural networks. They achieved an accuracy of 97% using the Bayesian bio-basis function neural network with two Gaussian distributions. In the same study, the SVM method was tested and was found to give excellent results. However, Yang used a small dataset of 13 sequences and the method is not available for testing.</p>
         <p>In this study, we have developed a support vector machine (SVM) system to predict the caspase substrate cleavage sites. First introduced by Cortes and Vapnik <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, the SVM method is a relatively new sub-branch of the machine learning algorithms. SVM has been shown to perform well in diverse computational biology applications such as the prediction of protein secondary structure <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>; protein fold <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>; protein quaternary structure <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>; protein homology <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>; protein-protein interaction sites <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>; protein domains <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, HIV protease cleavage sites <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> and T-cell epitopes <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. It is also used in the classification and validation of cancer tissue samples <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> and microarray expression data <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. Other applications of SVMs in biology have been reviewed by Byvatov and Schneider <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>, and Yang <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. We have compiled an extensive dataset of unique (non-redundant) cleavage sites to validate the SVM method and to further the development of other computational tools. Using various statistical metrics, we have shown that the SVM method is a rigorous and effective approach for predicting cleavage sites of caspase substrates.</p>
      </sec>
      <sec>
         <st>
            <p>Results and Discussion</p>
         </st>
         <p>The prediction of caspase substrate cleavage sites is important for our in-depth understanding of the protease-substrate interaction as well as in identifying new caspases substrates. Since the publication of the preferential tetrapeptide specificities by Thornberry <it>et al</it>. <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, many more caspase substrates have been discovered and the reported cleavage sites have been shown to vary considerably from the preferred sequences <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. Artificial intelligence-based techniques such as SVM and the neural network are elegant approaches for the extraction of complex patterns from biological sequence data. As the SVM methodology was successfully applied in several biological problems, we investigated the utility of the SVM approach in predicting the cleavage sites of caspase substrates.</p>
         <p>Based on the work by Fischer <it>et al</it>. and through our own data mining efforts, we have compiled a database of experimentally determined caspase substrates annotated with their cleavage sites. We have obtained a set of 195 unique cleavage sites from Fischer <it>et al</it>. and 24 unique cleavage sites from recently discovered caspase substrates reported in literature but were not detailed in Fischer <it>et al</it>. The 195 sequences were used for training the SVM classifier while the 24 sequences were used for testing the effectiveness of the SVM method. As there were no experimentally reported non-cleavage sites for caspases, we extracted tetrapeptide sequences at random positions (not including the cleavage sites) on experimentally determined caspase substrates. One non-cleavage site was extracted for every cleavage site on the same substrate. The assumption that an intuitively large proportion of tetrapeptide sequences other than the cleavage site(s) on the same substrate should not be recognized and cleaved by caspases justifies the use of these sequences as non-cleavage sites. An equal number of these non-cleavage sites were extracted to match the cleavage sites. Together, a primary dataset consisting of the tetrapeptide cleavage sites (positive examples) and non-cleavage sites (negative examples) was constructed and designated as the P<sub>4</sub>P<sub>1 </sub>dataset (Figure <figr fid="F1">1</figr>).</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>Different subsequence segments for SVM training and testing</p>
            </caption>
            <text>
               <p><b>Different subsequence segments for SVM training and testing</b>. For human Mcl-1 [Swiss-Prot:Q07820], a sequence window of 24 amino acids in length centred on the tetrapeptide cleavage site, TSTD (underlined) is shown. Amino acids to the left of the scissile bond (shown by the inverted triangle) are labelled from P<sub>1 </sub>(D) to P<sub>14 </sub>(L). Amino acids to the right of the scissile bond are labelled from P<sub>1</sub>' (G) to P<sub>10</sub>' (A). Curly brackets indicate the subsequence segments extracted for SVM implementation. The sequences spanning P<sub>4 </sub>to P<sub>1 </sub>(TSTD), P<sub>4 </sub>to P<sub>2</sub>' (TSTDGS) and P<sub>14 </sub>to P<sub>10</sub>' (LELVGEGSNNTSTDGSLPSTPPPA) are labelled as P<sub>4</sub>P<sub>1</sub>, P<sub>4</sub>P<sub>2</sub>' and P<sub>14</sub>P<sub>10</sub>' respectively.</p>
            </text>
            <graphic file="1471-2105-7-S5-S14-1"/>
         </fig>
         <p>Previously, Backes <it>et al</it>. <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> and Garay-Malpartida <it>et al</it>. <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> suggested that residues adjacent to the cleavage site may influence substrate cleavage. Backes <it>et al</it>. reported the high occurrence of specific amino acids at P<sub>1</sub>' for caspase-3 and P<sub>1</sub>' and P<sub>2</sub>' for granzyme B, a serine protease involved in apoptosis and in immune response <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. Garay-Malpartida <it>et al</it>. reported that a sizeable proportion of cleavage sites are localized within 'PEST' regions, of which have been suggested to label proteins for protease degradation. PEST regions are defined as sequence segments enriched with proline (P), glutamate (E), aspartate (D), serine (S) and threonine (T) residues <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp>. Therefore, to investigate the influence of the adjacent sequences on substrate cleavage, we further constructed a dataset containing tetrapeptide sequences with the P<sub>1</sub>' and P<sub>2</sub>' residues and a dataset containing tetrapeptide sequences flanked by ten residues on either side of the cleavage site. These datasets were designated as P<sub>4</sub>P<sub>2</sub>' and P<sub>14</sub>P<sub>10</sub>' respectively (Figure <figr fid="F1">1</figr>). The longer sequence segments would encapsulate the information contained in the critical tetrapeptide sequences as well as the P<sub>1</sub>' and P<sub>2</sub>' amino acids and other residues adjacent to the cleavage sites.</p>
         <p>Next, we divided the P<sub>4</sub>P<sub>1</sub>, P<sub>4</sub>P<sub>2</sub>' and P<sub>14</sub>P<sub>10</sub>' datasets into training and test datasets (Figure <figr fid="F2">2</figr>). The training datasets were used for optimizing the SVM parameters and for training the SVM classifier, while the test datasets were used for evaluating the SVM method. We have chosen the RBF kernel which requires parameters <it>&#947; </it>and <it>C </it>to be optimized. Using 10-fold cross-validation, the parameters &#947; and <it>C </it>were optimized at 0.01 and 100 (for P<sub>4</sub>P<sub>1 </sub>training dataset) and 0.1 and 100 (for both P<sub>4</sub>P<sub>2</sub>' and P<sub>14</sub>P<sub>10</sub>' training datasets). For each of P<sub>4</sub>P<sub>1</sub>, P<sub>4</sub>P<sub>2</sub>' and P<sub>14</sub>P<sub>10</sub>' training datasets, an overall accuracy of 98.97% was obtained during the cross-validation.</p>
         <fig id="F2">
            <title>
               <p>Figure 2</p>
            </title>
            <caption>
               <p>Schematic layout of the datasets used for SVM training and testing</p>
            </caption>
            <text>
               <p><b>Schematic layout of the datasets used for SVM training and testing</b>. The primary dataset consist of non-redundant tetrapeptide caspase substrate cleavage sites obtained from literature (see <supplr sid="S1">Additional File 1</supplr>) and an equal number of non-cleavage sites. <sup>1</sup>The P<sub>4</sub>P<sub>1 </sub>sequences consist of all the sequences in the primary tetrapeptide cleavage site dataset. P<sub>4</sub>P<sub>2</sub>' and P<sub>14 </sub>P<sub>10</sub>' datasets were derived by extracting subsequence segments from the parent protein chains in the vicinity of the tetrapeptide cleavage sites, as shown in Figure 1. All datasets contain equal number of positive and negative examples.</p>
            </text>
            <graphic file="1471-2105-7-S5-S14-2"/>
         </fig>
         <p>While the reported accuracy on the training datasets may indicate the effectiveness of a prediction method, it may not accurately portray how the method will perform on novel, hitherto undiscovered cleavage sites. Therefore, testing the SVM methodology on independent out-of-sample datasets, not used in the cross-validation is critical. Here, we applied the SVM classifiers, trained separately using the entire training datasets from the P<sub>4</sub>P<sub>1</sub>, P<sub>4</sub>P<sub>2</sub>' and P<sub>14</sub>P<sub>10</sub>' datasets with the optimized <it>&#947; </it>and C parameters, on the respective test datasets and evaluated the results. As shown in Table <tblr tid="T1">1</tblr>, for the P<sub>4</sub>P<sub>1 </sub>test dataset, the SVM method obtained an accuracy of 95.83% using the RBF kernel with <it>&#947; </it>= 0.01 and C = 100. For both the P<sub>4</sub>P<sub>2</sub>' and P<sub>14</sub>P<sub>10</sub>' test datasets, the SVM method obtained an accuracy of 97.92% using the RBF kernel with <it>&#947; </it>= 0.1 and C = 100.</p>
         <tbl id="T1">
            <title>
               <p>Table 1</p>
            </title>
            <caption>
               <p>Results of SVM prediction for various test datasets.</p>
            </caption>
            <tblbdy cols="7">
               <r>
                  <c ca="center">
                     <p>
                        <b>Test datasets</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <b>
                           <it>&#947;</it>
                        </b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <b>
                           <it>C</it>
                        </b>
                     </p>
                  </c>
                  <c cspan="4" ca="center">
                     <p>
                        <b>Performance Evaluation</b>
                     </p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c cspan="4">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>
                        <b>AC (%)</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <b>SE (%)</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <b>SP (%)</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <b>MCC</b>
                     </p>
                  </c>
               </r>
               <r>
                  <c cspan="7">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p>
                        <b>P</b>
                        <sub>
                           <b>4</b>
                        </sub>
                        <b>P</b>
                        <sub>
                           <b>1</b>
                        </sub>
                     </p>
                  </c>
                  <c ca="center">
                     <p>0.01</p>
                  </c>
                  <c ca="center">
                     <p>100</p>
                  </c>
                  <c ca="center">
                     <p>95.83</p>
                  </c>
                  <c ca="center">
                     <p>95.83</p>
                  </c>
                  <c ca="center">
                     <p>95.83</p>
                  </c>
                  <c ca="center">
                     <p>0.92</p>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p><b>P</b><sub><b>4</b></sub><b>P</b><sub><b>2</b></sub>'</p>
                  </c>
                  <c ca="center">
                     <p>0.1</p>
                  </c>
                  <c ca="center">
                     <p>100</p>
                  </c>
                  <c ca="center">
                     <p>97.92</p>
                  </c>
                  <c ca="center">
                     <p>95.83</p>
                  </c>
                  <c ca="center">
                     <p>100.00</p>
                  </c>
                  <c ca="center">
                     <p>0.96</p>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p><b>P</b><sub><b>14</b></sub><b>P</b><sub><b>10</b></sub>'</p>
                  </c>
                  <c ca="center">
                     <p>0.1</p>
                  </c>
                  <c ca="center">
                     <p>100</p>
                  </c>
                  <c ca="center">
                     <p>97.92</p>
                  </c>
                  <c ca="center">
                     <p>95.83</p>
                  </c>
                  <c ca="center">
                     <p>100.00</p>
                  </c>
                  <c ca="center">
                     <p>0.96</p>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p>
                        <b>P</b>
                        <sub>
                           <b>4</b>
                        </sub>
                        <b>P</b>
                        <sub>
                           <b>1</b>
                        </sub>
                        <b>(-D)</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>0.01</p>
                  </c>
                  <c ca="center">
                     <p>1</p>
                  </c>
                  <c ca="center">
                     <p>81.25</p>
                  </c>
                  <c ca="center">
                     <p>62.50</p>
                  </c>
                  <c ca="center">
                     <p>100.00</p>
                  </c>
                  <c ca="center">
                     <p>0.67</p>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p><b>P</b><sub><b>4</b></sub><b>P</b><sub><b>2</b></sub>'<b>(-D)</b></p>
                  </c>
                  <c ca="center">
                     <p>1</p>
                  </c>
                  <c ca="center">
                     <p>100</p>
                  </c>
                  <c ca="center">
                     <p>89.58</p>
                  </c>
                  <c ca="center">
                     <p>79.17</p>
                  </c>
                  <c ca="center">
                     <p>100.00</p>
                  </c>
                  <c ca="center">
                     <p>0.81</p>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p><b>P</b><sub><b>14</b></sub><b>P</b><sub><b>10</b></sub>'<b>(-D)</b></p>
                  </c>
                  <c ca="center">
                     <p>0.1</p>
                  </c>
                  <c ca="center">
                     <p>1</p>
                  </c>
                  <c ca="center">
                     <p>93.75</p>
                  </c>
                  <c ca="center">
                     <p>87.50</p>
                  </c>
                  <c ca="center">
                     <p>100.00</p>
                  </c>
                  <c ca="center">
                     <p>0.88</p>
                  </c>
               </r>
            </tblbdy>
            <tblfn>
               <p>The SVM parameters (<it>&#947; </it>and <it>C</it>) were obtained from the cross-validation conducted on the training datasets.</p>
            </tblfn>
         </tbl>
         <p>Our analysis on the training and test datasets indicated a large percentage of cleavage sites with the XXXD motif (~98%) and a very small percentage of cleavage sites with a non-canonical XXXE motif (~2%). While experimental cleavage site specificities reported in Thornberry <it>et al</it>. suggest most, if not all, sequences to conform to the XXXD motif <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, the inclusion of a large proportion of these sequences in the development of the SVM system could lead to over-training of the classifier and confound the results obtained with different sequence representations. To mitigate this possibility, we further constructed datasets identical to P<sub>4</sub>P<sub>1</sub>, P<sub>4</sub>P<sub>2</sub>' and P<sub>14</sub>P<sub>10</sub>' datasets, but with the P<sub>1 </sub>residue removed in all the sequences (labelled as P<sub>4</sub>P<sub>1</sub>(-D), P<sub>4</sub>P<sub>2</sub>' (-D) and P<sub>14</sub>P<sub>10</sub>' (-D) datasets respectively). These datasets were further divided into training and test sets and SVM parameters were optimized in the manner as reported for the original P<sub>4</sub>P<sub>1</sub>, P<sub>4</sub>P<sub>2</sub>' and P<sub>14</sub>P<sub>10</sub>' datasets. The trained SVM classifiers were tested on the respective test datasets. As shown in Table <tblr tid="T1">1</tblr>, the SVM method obtained an accuracy of 81.25% for the P<sub>4</sub>P<sub>1 </sub>(-D) test dataset. The performance of the SVM improved significantly when tested on P<sub>4</sub>P<sub>2</sub>' (-D) and P<sub>14</sub>P<sub>10</sub>' (-D) datasets as accuracy readings of 89.58% and 93.75% were obtained respectively. While the accuracy on all (-D) test datasets were lower compared to the corresponding original datasets, a larger degree of improvement was observed when the longer sequence representations were used, as evidenced by the greater spread in both the accuracy and sensitivity readings for the P<sub>4</sub>P<sub>1</sub>(-D), P<sub>4</sub>P<sub>2</sub>' (-D) and P<sub>14</sub>P<sub>10</sub>' (-D) datasets. An analysis of the misclassified sequences showed that cleavage sites such as CLLD<sup>2193 </sup>from Notch1 [Swiss-Prot:P46531] and PEVD<sup>142 </sup>from p23 co-chaperone [Swiss-Prot:Q15185], which differ markedly from reported tetrapeptide specificities <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, were misclassified by the P<sub>4</sub>P<sub>1 </sub>(-D)-trained SVM, but were correctly predicted when the P<sub>4</sub>P<sub>2</sub>' (-D) and P<sub>14</sub>P<sub>10</sub>' (-D) datasets were used. Also, the SVM trained with the P<sub>4</sub>P<sub>1 </sub>(-D) and P<sub>4</sub>P<sub>2</sub>' (-D) datasets failed to correctly classify the non-canonical cleavage site VQPE<sup>205 </sup>from DIAP1 [Swiss-Prot:Q24306], but correctly predicted the cleavage site when trained with the P<sub>14</sub>P<sub>10</sub>' (-D) dataset. These results suggest that the SVM trained with the (-D) datasets may be useful for identifying hitherto undiscovered cleavage sites while circumventing the problem of overtraining due to the high percentage of "XXXD" cleavage sites in the training datasets. The results also provided further evidence for the suggestion that the P<sub>1</sub>', P<sub>2</sub>' and residues further upstream and downstream of the cleavage site may influence substrate cleavage, and by accounting for these flanking sequences, the SVM performance can be improved. It was also shown that the SVM method can be extended to predict cleavage sites with residues other than the canonical aspartate (D) at P<sub>1</sub>. While the occurrence of the non-canonical cleavage sites remains proportionately small, it does imply that the sampling space is not limited to the XXXD motif for cleavage sites. Consequently, the ability to predict these non-canonical cleavage sites will be a useful complement to existing computational methods which assumes the consensus XXXD motif as the basis for their algorithms.</p>
         <p>As other methods were not readily accessible, we were only able to compare the GraBCas method on our datasets. Since the GraBCas method primarily focuses on the tetrapeptide motif, we have applied it to the P<sub>4</sub>P<sub>1 </sub>training dataset alone. As the GraBCas method can only be applied to potential cleavage sites with aspartate (D) at the P<sub>1 </sub>position, we scored the positive sequences in the P<sub>4</sub>P<sub>1 </sub>training dataset with the GraBCas matrix values for the different caspases, selected the highest score and checked for the percentage of correctly predicted cleavage sites <it>(or Sensitivity, SE) </it>against a series of cut-off scores. As shown in Table <tblr tid="T2">2</tblr>, the sensitivity values declined steadily from 87.43% to 19.76% as the cutoff values were progressively increased (0.1, 1, 5, 10, 20). We have also tested the GraBCas method on the positive sequences in the P<sub>4</sub>P<sub>1 </sub>test dataset. As there were no recommended cut-off scores for predicting the cleavage sites, we chose the cut-off score of 0.1, which was used for the granzyme B cleavage site prediction as reported in Backes <it>et al</it>. <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. At the cut-off score of 0.1, GraBCas predicted only 16 out of 24 cleavage sites correctly <it>(SE = 66.67%)</it>.</p>
         <tbl id="T2">
            <title>
               <p>Table 2</p>
            </title>
            <caption>
               <p>GraBCas prediction on the P<sub>4</sub>P<sub>1 </sub>training dataset (positive sequences only)</p>
            </caption>
            <tblbdy cols="2">
               <r>
                  <c ca="center">
                     <p>
                        <b>GraBCas Cutoff</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <b>SE (%)</b>
                     </p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p>
                        <b>0.1</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>87.43</p>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p>
                        <b>1.0</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>69.46</p>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p>
                        <b>5.0</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>40.72</p>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p>
                        <b>10.0</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>28.14</p>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p>
                        <b>20.0</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>19.76</p>
                  </c>
               </r>
            </tblbdy>
            <tblfn>
               <p/>
            </tblfn>
         </tbl>
         <p>Finally, to investigate how the SVM approach can complement experimental work on caspase substrate cleavage, we applied the SVM approach to predict the caspase-mediated cleavage of an anti-apoptotic protein, Livin [Swiss-Prot:Q96CA5] and its mutant sequences as reported in Yan <it>et al</it>. <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>, based on the prediction of the caspase cleavage sites. As shown in Table <tblr tid="T3">3</tblr>, the experimental cleavage of wild type human Livin and its deletion mutants were compared to the results predicted by the SVM trained with the P<sub>14</sub>P<sub>10</sub>' (-D) dataset. With the exception of the LE &#916;52&#8211;61, &#916;51&#8211;53 and &#916;53&#8211;61 mutants, all other sequences were correctly predicted to be cleaved or not cleaved by caspases as indicated. For the LE &#916;52&#8211;61 and &#916;51&#8211;53 mutants, the flanking sequences upstream and downstream of the cleavage site were likely to have influenced cleavage of the substrates, as predicted by the SVM. However, cleavage of substrates was prevented due to the absence of the Asp at P<sub>1 </sub>(DHVD<sup>52</sup>). While the SVM predicted the cleavage of &#916;53&#8211;61 mutant, it was proposed by Yan <it>et al</it>. that the deleted residues might have led to the distortion of the structure of a neighboring domain or affected its signaling function, which subsequently inhibited the substrate cleavage through downstream signaling. These findings suggest that the SVM-based prediction of caspase substrate cleavage sites might be helpful in identifying potential caspase substrates.</p>
         <tbl id="T3">
            <title>
               <p>Table 3</p>
            </title>
            <caption>
               <p>SVM prediction of caspase substrate cleavage sites in Livin and mutants.</p>
            </caption>
            <tblbdy cols="3">
               <r>
                  <c ca="left">
                     <p>
                        <b>Substrate</b>
                        <sup>
                           <it>a</it>
                        </sup>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <b>Experimental Results</b>
                        <sup>
                           <it>b</it>
                        </sup>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <b>SVM Prediction</b>
                        <sup>
                           <it>c</it>
                        </sup>
                     </p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Wild type Livin</p>
                  </c>
                  <c ca="center">
                     <p>Cleaved</p>
                  </c>
                  <c ca="center">
                     <p>Cleaved</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>LE &#916;52&#8211;61</p>
                  </c>
                  <c ca="center">
                     <p>Not cleaved</p>
                  </c>
                  <c ca="center">
                     <p>Cleaved</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>&#916;53&#8211;55</p>
                  </c>
                  <c ca="center">
                     <p>Cleaved</p>
                  </c>
                  <c ca="center">
                     <p>Cleaved</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>&#916;55&#8211;57</p>
                  </c>
                  <c ca="center">
                     <p>Cleaved</p>
                  </c>
                  <c ca="center">
                     <p>Cleaved</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>&#916;57&#8211;59</p>
                  </c>
                  <c ca="center">
                     <p>Cleaved</p>
                  </c>
                  <c ca="center">
                     <p>Cleaved</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>&#916;60&#8211;62</p>
                  </c>
                  <c ca="center">
                     <p>Cleaved</p>
                  </c>
                  <c ca="center">
                     <p>Cleaved</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>&#916;52&#8211;61</p>
                  </c>
                  <c ca="center">
                     <p>Not cleaved</p>
                  </c>
                  <c ca="center">
                     <p>Not cleaved</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>&#916;53&#8211;61</p>
                  </c>
                  <c ca="center">
                     <p>Not cleaved</p>
                  </c>
                  <c ca="center">
                     <p>Cleaved</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>&#916;52</p>
                  </c>
                  <c ca="center">
                     <p>Not cleaved</p>
                  </c>
                  <c ca="center">
                     <p>Not cleaved</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>&#916;51&#8211;53</p>
                  </c>
                  <c ca="center">
                     <p>Not cleaved</p>
                  </c>
                  <c ca="center">
                     <p>Cleaved</p>
                  </c>
               </r>
            </tblbdy>
            <tblfn>
               <p><it>a</it>. Wild type Livin and various deletion mutants as reported in Yan <it>et al. b </it>Experimentally verified cleavage (<it>cleaved</it>) or non-cleavage (<it>not cleaved</it>) of Livin and deletion mutants. <it>c</it>. SVM prediction of caspase cleavage sites on Livin and deletion mutants (<it>Cleaved </it>&#8211; presence of cleavage site; <it>Not cleaved </it>-absence of cleavage site).</p>
            </tblfn>
         </tbl>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>In conclusion, we have compiled an extensive dataset of caspase substrates cleavage sites as reported in the literature for the development and validation of other computational tools. We have rigorously tested the SVM approach for recognizing the cleavage sites of these substrates. Our results show that the SVM method is complementary to existing methods, if not more effective. The prediction accuracy can also be improved by accounting for sequences at the P<sub>1</sub>' and P<sub>2</sub>' positions and further upstream and downstream of the cleavage site. In addition, the SVM method may be useful for predicting the non-canonical cleavage sites lacking aspartate (D) at the P<sub>1 </sub>position, such as those found in DIAP1 and other proteins as reported in literature <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. As the substrate proteins used in the present method are derived from a variety of organisms (human, mouse, rat, fruit fly, cow, chicken, frog, worm and viruses) and are cleaved by various caspases (caspase-1,-3, -6, -7, -8, -9, -12, -13 and -14), our methodology is applicable to the detection of cleavage sites in substrates from various organisms and is not caspase-specific.</p>
         <p>Together with existing computational tools, our method will complement on-going experimental efforts in identifying new caspase substrates and further our understanding of the biochemistry of caspase substrate cleavage. This knowledge will be helpful for resolving the larger role of these proteases and their targets in critical processes like apoptosis and inflammation. As more information about caspases and their substrates becomes available, we will update and improve the performance of our methodology.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Datasets</p>
            </st>
            <p>Our primary dataset contains 438 unique sequences (219 cleavage sites and 219 non-cleavage sites). Of the 219 cleavage sites, 195 were obtained from Fischer <it>et al</it>. and 24 from literature search. Besides the tetrapeptide cleavage site sequences, subsequence segments of varying lengths centered on the tetrapeptide cleavage sites were extracted as shown in Figure <figr fid="F1">1</figr>. In total, three groups of sequences were obtained: tetrapeptide cleavage sequences (henceforth termed as the P<sub>4</sub>P<sub>1 </sub>sequences), tetrapeptide cleavage sequences with the next two residues, P<sub>1</sub>' and P<sub>2</sub>' residues (P<sub>4</sub>P<sub>2</sub>'sequences), and tetrapeptide sequences with upstream residues up to P<sub>14 </sub>and downstream residues up to P<sub>10</sub>' (P<sub>14</sub>P<sub>10</sub>' sequences). The cleavage sites and the corresponding subsequences were designated as positive examples for the SVM training and testing.</p>
            <p>The 219 non-cleavage sites were obtained by extracting tetrapeptide sequences at random positions (not including the cleavage sites) on caspase substrates. One non-cleavage site was extracted for every cleavage site on the same substrate. Subsequence segments centered on these non-cleavage sites were also extracted in the manner reported earlier. The non-cleavage sites and the corresponding subsequences were designated as negative examples for SVM training and testing. Together, the positive and negative examples in the three group of sequences were designated as the P<sub>4</sub>P<sub>1</sub>, P<sub>4</sub>P<sub>2</sub>' and P<sub>14</sub>P<sub>10</sub>' datasets respectively. Each of these datasets were further divided in the following manner (Figure <figr fid="F2">2</figr>):</p>
            <sec>
               <st>
                  <p>1. Training datasets</p>
               </st>
               <p>Training datasets were used for optimizing the SVM parameters and for training the SVM classifier to predict unseen test examples. Each training dataset contain 390 sequences (195 positive and 195 negative examples). The sequences were obtained from Fischer <it>et al. </it>and are available in <supplr sid="S1">Additional File 1</supplr>.</p>
               <suppl id="S1">
                  <title>
                     <p>Additional File 1</p>
                  </title>
                  <text>
                     <p>Dataset of caspase substrate cleavage sites (for cross-validation and SVM training). List of caspase substrate cleavage sites used for cross-validation and training of the SVM.</p>
                  </text>
                  <file name="1471-2105-7-S5-S14-S1.doc">
                     <p>Click here for file</p>
                  </file>
               </suppl>
            </sec>
            <sec>
               <st>
                  <p>2. Test datasets</p>
               </st>
               <p>Test datasets were used for evaluating the performance of the SVM method. Each test dataset contains 48 sequences (24 positive and 24 negative examples). The sequences were obtained from recently discovered substrates extracted from literature search which were not reported in Fischer <it>et al</it>. Sequences are available in <supplr sid="S2">Additional File 2</supplr>.</p>
               <suppl id="S2">
                  <title>
                     <p>Additional File 2</p>
                  </title>
                  <text>
                     <p>Dataset of caspase substrate cleavage sites (for independent out-of-sample testing). List of caspase substrate cleavage sites used for independent out-of-sample testing of the SVM method.</p>
                  </text>
                  <file name="1471-2105-7-S5-S14-S2.doc">
                     <p>Click here for file</p>
                  </file>
               </suppl>
               <p>Datasets containing sequences identical to the P<sub>4</sub>P<sub>1</sub>, P<sub>4</sub>P<sub>2</sub>' and P<sub>14</sub>P<sub>10</sub>' datasets but without the P<sub>1 </sub>residue were also constructed (designated as P<sub>4</sub>P<sub>1</sub>(-D), P<sub>4</sub>P<sub>2</sub>'(-D) and P<sub>14</sub>P<sub>10</sub>'(-D) respectively). These datasets were divided into training and test datasets as mentioned earlier.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Vector encoding schemes</p>
            </st>
            <p>To encapsulate the sequence information into a format suitable for SVM training and testing, the sequences were transformed into <it>n-</it>dimensional vectors using an orthonormal encoding scheme. Each amino acid is represented by a 20-dimensional vector, composed of either zero or one as elements. For example, alanine was represented as [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1] and cysteine as [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0]. Therefore, for the P<sub>4</sub>P<sub>1 </sub>dataset, each sequence was represented by an 80-dimensional vector. Sequences in the P<sub>4</sub>P<sub>2</sub>' and P<sub>14</sub>P<sub>10</sub>' datasets were represented by 120 and 480 dimensional vectors respectively.</p>
         </sec>
         <sec>
            <st>
               <p>SVM implementation</p>
            </st>
            <p>For SVM implementation, we used the freely downloadable LIBSVM package by Chang and Lin <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. Details of the SVM methodology can be obtained from the article by Burges <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. Briefly, SVM is based on the structural risk minimization principle from statistical learning theory. A set of positively and negatively examples can be represented by the feature vectors <it>x</it><sub><it>i </it></sub>(<it>i </it>= 1, 2,..., <it>N</it>) with corresponding labels <it>y</it><sub><it>i </it></sub>&#8712; {+1,-1}. To classify the data, the SVM trains a classifier by mapping the input samples, using a kernel function in most cases, onto a high-dimensional space, and then seeking a separating hyperplane that differentiates the two classes with maximal margin and minimal error. The decision function for new predictions on unseen examples is given as:</p>
            <p>
               <graphic file="1471-2105-7-S5-S14-i1.gif"/>
            </p>
            <p>where <it>K </it>(<it>x</it><sub><it>i</it></sub>&#183;<it>x</it><sub><it>j </it></sub>) is the kernel function, and the parameters are determined by maximizing the following:</p>
            <p>
               <graphic file="1471-2105-7-S5-S14-i2.gif"/>
            </p>
            <p>under the conditions,</p>
            <p>
               <graphic file="1471-2105-7-S5-S14-i3.gif"/>
            </p>
            <p>The variable <it>C </it>serves as the regularization parameter that controls the trade-off between margin and classification error. As the efficacy of the SVM prediction system is dependent on the type of kernel used, we explored various kernels (linear, sigmoid, polynomial and the radial basis function) commonly implemented in biological problems on our datasets. We have chosen the widely used radial basis function (RBF) kernel as it was found to be most effective (data not shown):</p>
            <p>
               <graphic file="1471-2105-7-S5-S14-i4.gif"/>
            </p>
            <p>Two parameters are required for optimizing the SVM classifier; <it>&#947;</it>, which determines the capacity of the RBF kernel and the regularization parameter <it>C</it>.</p>
         </sec>
         <sec>
            <st>
               <p>SVM optimization</p>
            </st>
            <p>To optimize the SVM parameters <it>&#947; </it>and <it>C</it>, we applied 10-fold cross-validation on each of the training datasets using various combinations of <it>&#947; </it>and <it>C</it>. In 10-fold cross-validation, the training dataset was spilt into 10 subsets where one of the subsets was used as the test set while the other subsets were used for training the classifier. The trained classifier was tested using the test set. The process is repeated 10 times using a different subset for testing, hence ensuring that all subsets are used for both training and testing. SVM parameters <it>&#947; </it>and <it>C </it>were stepped through combinations of 0.01, 0.1, 1, 10, 100 for <it>&#947;</it>, and 1, 10, 100 and 1000 for <it>C </it>in a grid-based manner.</p>
         </sec>
         <sec>
            <st>
               <p>SVM training and testing</p>
            </st>
            <p>The best combinations of <it>&#947; </it>and <it>C </it>obtained from the optimization process were used for training the SVM classifier using the entire training dataset. The SVM classifier was subsequently used to predict the test datasets. Various quantitative variables were obtained to measure the effectiveness of the SVM method:</p>
            <p>(i) <it>TP</it>, true positives &#8211; the number of correctly classified cleavage sites.</p>
            <p>(ii) <it>FP</it>, false positives &#8211; the number of incorrectly classified non-cleavage sites.</p>
            <p>(iii) <it>TN</it>, true negatives &#8211; the number of correctly classified non-cleavage sites.</p>
            <p>(iv) <it>FN</it>, false negatives &#8211; the number of incorrectly classified cleavage sites.</p>
            <p>Using the variables above, a series of statistical metrics were computed to measure the effectiveness of the SVM method. <it>Sensitivity (SE) </it>and <it>Specificity (SP)</it>, which indicates the ability of the prediction system to correctly classify the cleavage and non-cleavage sites respectively, were calculated:</p>
            <p>
               <graphic file="1471-2105-7-S5-S14-i5.gif"/>
            </p>
            <p>To provide an indication of the overall performance of the system, we computed <it>Accuracy (AC)</it>, for the percentage of correctly classified sites, and the <it>Matthews Correlation Coefficient (MCC)</it>.</p>
            <p>
               <graphic file="1471-2105-7-S5-S14-i6.gif"/>
            </p>
         </sec>
         <sec>
            <st>
               <p>Prediction of caspase-mediated cleavage of Livin and mutants</p>
            </st>
            <p>The SVM trained using the P<sub>14</sub>P<sub>10</sub>' (-D) dataset (RBF kernel, <it>&#947; </it>= 0.1, <it>C </it>= 100) was used to predict the cleavage of Livin [Swiss-Prot:Q96CA5] and the various deletion mutants, based on the prediction of the caspase cleavage sites, as reported in Yan <it>et al</it>. <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. 24 amino acids subsequence segments centred on the P<sub>1 </sub>residue of the reported Livin cleavage site (DHVD<sup>52</sup>) were extracted from both wild type and mutant Livin sequences. Mutants used in this study are: LE &#916;52&#8211;61, &#916;53&#8211;55, &#916;55&#8211;57, &#916;57&#8211;59, &#916;60&#8211;62, &#916;52&#8211;61, &#916;53&#8211;61, &#916;52 and &#916;51&#8211;53. In mutants with Asp-52 deleted, the peptide windows were centred on the subsequent residue occupying position 52.</p>
         </sec>
         <sec>
            <st>
               <p>Comparison with other available methods</p>
            </st>
            <p>As the CasPredictor method is unavailable from the published website, it was not tested. The performance of GrabCas was compared with the SVM method using the current datasets. As the GraBCas scoring matrices are specific for the tripeptide, P<sub>4</sub>-P<sub>3</sub>-P<sub>2</sub>, and assume that P<sub>1 </sub>is an Asp (D) residue, the GraBCas matrices were used to score only the positive sequences (cleavage sites) from the P<sub>4</sub>P<sub>1 </sub>training dataset. As GraBCas scores for different caspases were available, only the highest scores were recorded. The percentage of correctly predicted cleavage sites <it>(Sensitivity, SE) </it>were calculated as mentioned earlier. The P<sub>4</sub>P<sub>1 </sub>test dataset was tested in the similar manner and the <it>SE </it>score was obtained at a GraBCas cut-off of 0.1.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>LJKW conceived the application of SVM for prediction of caspase substrate cleavage sites. TWT contributed with ideas on the experimentation and SR finalized the manuscript. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>LJKW gratefully acknowledges the award of a research scholarship from the National University of Singapore.</p>
            <p>This article has been published as part of <it>BMC Bioinformatics </it>Volume 7, Supplement 5, 2006: APBioNet &#8211; Fifth International Conference on Bioinformatics (InCoB2006). The full contents of the supplement are available online at <url>http://www.biomedcentral.com/1471-2105/7?issue=S5</url>.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Caspases: more than just killers?</p>
            </title>
            <aug>
               <au>
                  <snm>Los</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Stroh</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Janicke</snm>
                  <fnm>RU</fnm>
               </au>
               <au>
                  <snm>Engels</snm>
                  <fnm>IH</fnm>
               </au>
               <au>
                  <snm>Schulze-Osthoff</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Trends Immunol</source>
            <pubdate>2001</pubdate>
            <volume>22</volume>
            <fpage>31</fpage>
            <lpage>34</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1471-4906(00)01814-7</pubid>
                  <pubid idtype="pmpid" link="fulltext">11286689</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Apoptosis-independent functions of killer caspases</p>
            </title>
            <aug>
               <au>
                  <snm>Algeciras-Schimnich</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bamhart</snm>
                  <fnm>BC</fnm>
               </au>
               <au>
                  <snm>Peter</snm>
                  <fnm>ME</fnm>
               </au>
            </aug>
            <source>Curr Opin Cell Biol</source>
            <pubdate>2002</pubdate>
            <volume>14</volume>
            <fpage>721</fpage>
            <lpage>726</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0955-0674(02)00384-8</pubid>
                  <pubid idtype="pmpid" link="fulltext">12473345</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Vital functions for lethal caspases</p>
            </title>
            <aug>
               <au>
                  <snm>Launay</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hermine</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Fontenay</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kroemer</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Solary</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Garrido</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Oncogene</source>
            <pubdate>2005</pubdate>
            <volume>24</volume>
            <fpage>5137</fpage>
            <lpage>5148</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/sj.onc.1208524</pubid>
                  <pubid idtype="pmpid" link="fulltext">16079910</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Substrate specificities of caspase family proteases</p>
            </title>
            <aug>
               <au>
                  <snm>Talanian</snm>
                  <fnm>RV</fnm>
               </au>
               <au>
                  <snm>Quinlan</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Trautz</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hackett</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Mankovich</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Banach</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ghayur</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Brady</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>WW</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>1997</pubdate>
            <volume>272</volume>
            <fpage>9677</fpage>
            <lpage>9682</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.272.15.9677</pubid>
                  <pubid idtype="pmpid" link="fulltext">9092497</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>A combinatorial approach defines specificities of members of the caspase family and granzyme B. Functional relationships established for key mediators of apoptosis</p>
            </title>
            <aug>
               <au>
                  <snm>Thornberry</snm>
                  <fnm>NA</fnm>
               </au>
               <au>
                  <snm>Rano</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Peterson</snm>
                  <fnm>EP</fnm>
               </au>
               <au>
                  <snm>Rasper</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Timkey</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Garcia-Calvo</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Houtzager</snm>
                  <fnm>VM</fnm>
               </au>
               <au>
                  <snm>Nordstrom</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Roy</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Vaillancourt</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Chapman</snm>
                  <fnm>KT</fnm>
               </au>
               <au>
                  <snm>Nicholson</snm>
                  <fnm>DW</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>1997</pubdate>
            <volume>272</volume>
            <fpage>17907</fpage>
            <lpage>17911</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.272.29.17907</pubid>
                  <pubid idtype="pmpid" link="fulltext">9218414</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Mammalian caspases: structure, activation, substrates, and functions during apoptosis</p>
            </title>
            <aug>
               <au>
                  <snm>Earnshaw</snm>
                  <fnm>WC</fnm>
               </au>
               <au>
                  <snm>Martins</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Kaufmann</snm>
                  <fnm>SH</fnm>
               </au>
            </aug>
            <source>Annu Rev Biochem</source>
            <pubdate>1999</pubdate>
            <volume>68</volume>
            <fpage>383</fpage>
            <lpage>424</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.biochem.68.1.383</pubid>
                  <pubid idtype="pmpid" link="fulltext">10872455</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Many cuts to ruin: a comprehensive ubdate of caspase substrates</p>
            </title>
            <aug>
               <au>
                  <snm>Fischer</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Janicke</snm>
                  <fnm>RU</fnm>
               </au>
               <au>
                  <snm>Schulze-Osthoff</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Cell Death Differ</source>
            <pubdate>2003</pubdate>
            <volume>10</volume>
            <fpage>76</fpage>
            <lpage>100</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/sj.cdd.4401160</pubid>
                  <pubid idtype="pmpid" link="fulltext">12655297</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Protein Identification and Analysis Tools on the ExPASy Server</p>
            </title>
            <aug>
               <au>
                  <snm>Gasteiger</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Hoogland</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Gattiker</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Duvaud</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wilkins</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Appel</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>The Proteomics Protocols Handbook</source>
            <publisher>Humana Press</publisher>
            <editor>Walker JM</editor>
            <pubdate>2005</pubdate>
            <fpage>571</fpage>
            <lpage>607</lpage>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Toward computer-based cleavage site prediction of cysteine endopeptidases</p>
            </title>
            <aug>
               <au>
                  <snm>Lohmuller</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Wenzler</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Hagemann</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kiess</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Peters</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Dandekar</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Reinheckel</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Biol Chem</source>
            <pubdate>2003</pubdate>
            <volume>384</volume>
            <fpage>899</fpage>
            <lpage>909</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1515/BC.2003.101</pubid>
                  <pubid idtype="pmpid">12887057</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>CaSPredictor: a new computer-based tool for caspase substrate prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Garay-Malpartida</snm>
                  <fnm>HM</fnm>
               </au>
               <au>
                  <snm>Occhiucci</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Alves</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Belizario</snm>
                  <fnm>JE</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>Suppl 1</issue>
            <fpage>i169</fpage>
            <lpage>i176</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti1034</pubid>
                  <pubid idtype="pmpid" link="fulltext">15961454</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Amino acid sequences common to rapidly degraded proteins: the PEST hypothesis</p>
            </title>
            <aug>
               <au>
                  <snm>Rogers</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wells</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Rechsteiner</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1986</pubdate>
            <volume>234</volume>
            <fpage>364</fpage>
            <lpage>368</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.2876518</pubid>
                  <pubid idtype="pmpid" link="fulltext">2876518</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>PEST sequences and regulation by proteolysis</p>
            </title>
            <aug>
               <au>
                  <snm>Rechsteiner</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rogers</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Trends Biochem Sci</source>
            <pubdate>1996</pubdate>
            <volume>21</volume>
            <fpage>267</fpage>
            <lpage>271</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0968-0004(96)10031-1</pubid>
                  <pubid idtype="pmpid" link="fulltext">8755249</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>GraBCas: a bioinformatics tool for score-based prediction of Caspase- and Granzyme B-cleavage sites in protein sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Backes</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kuentzer</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lenhof</snm>
                  <fnm>HP</fnm>
               </au>
               <au>
                  <snm>Comtesse</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Meese</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>208</fpage>
            <lpage>213</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gki433</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Prediction of caspase cleavage sites using Bayesian bio-basis function neural networks</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>ZR</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>1831</fpage>
            <lpage>1837</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti281</pubid>
                  <pubid idtype="pmpid" link="fulltext">15671118</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Support vector networks</p>
            </title>
            <aug>
               <au>
                  <snm>Cortes</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Vapnik</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Machine Learning</source>
            <pubdate>1995</pubdate>
            <volume>20</volume>
            <fpage>273</fpage>
            <lpage>293</lpage>
         </bibl>
         <bibl id="B16">
            <title>
               <p>A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach</p>
            </title>
            <aug>
               <au>
                  <snm>Hua</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2001</pubdate>
            <volume>308</volume>
            <fpage>397</fpage>
            <lpage>407</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2001.4580</pubid>
                  <pubid idtype="pmpid" link="fulltext">11327775</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Secondary structure prediction with support vector machines</p>
            </title>
            <aug>
               <au>
                  <snm>Ward</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>McGuffin</snm>
                  <fnm>LJ</fnm>
               </au>
               <au>
                  <snm>Buxton</snm>
                  <fnm>BF</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>DT</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>1650</fpage>
            <lpage>1655</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg223</pubid>
                  <pubid idtype="pmpid" link="fulltext">12967961</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Two-stage multi-class support vector machines to protein secondary structure prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Nguyen</snm>
                  <fnm>MN</fnm>
               </au>
               <au>
                  <snm>Rajapakse</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>Pac Symp Biocomput</source>
            <pubdate>2005</pubdate>
            <fpage>346</fpage>
            <lpage>357</lpage>
            <xrefbib>
               <pubid idtype="pmpid">15759640</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Multi-class protein fold recognition using support vector machines and neural networks</p>
            </title>
            <aug>
               <au>
                  <snm>Ding</snm>
                  <fnm>CHQ</fnm>
               </au>
               <au>
                  <snm>Dubchak</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>349</fpage>
            <lpage>358</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/17.4.349</pubid>
                  <pubid idtype="pmpid" link="fulltext">11301304</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Classification of protein quaternary structure with support vector machine</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>SW</fnm>
               </au>
               <au>
                  <snm>Pan</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>HC</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>YL</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>HY</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>2390</fpage>
            <lpage>2396</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg331</pubid>
                  <pubid idtype="pmpid" link="fulltext">14668222</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Support vector machines with profile-based kernels for remote protein homology detection</p>
            </title>
            <aug>
               <au>
                  <snm>Busuttil</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Abela</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Pace</snm>
                  <fnm>GJ</fnm>
               </au>
            </aug>
            <source>Genome Inform</source>
            <pubdate>2004</pubdate>
            <volume>15</volume>
            <fpage>191</fpage>
            <lpage>200</lpage>
            <xrefbib>
               <pubid idtype="pmpid">15706505</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Improved prediction of protein-protein binding sites using a support vector machines approach</p>
            </title>
            <aug>
               <au>
                  <snm>Bradford</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Westhead</snm>
                  <fnm>DR</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>1487</fpage>
            <lpage>1494</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti242</pubid>
                  <pubid idtype="pmpid" link="fulltext">15613384</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>The SBASE domain sequence resource, release 12: prediction of protein domain-architecture using support vector machines</p>
            </title>
            <aug>
               <au>
                  <snm>Vlahovicek</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Kajan</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Agoston</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Pongor</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <issue>33 Database</issue>
            <fpage>D223</fpage>
            <lpage>225</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">540066</pubid>
                  <pubid idtype="pmpid" link="fulltext">15608182</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Support vector machines for predicting HIV protease cleavage sites in protein</p>
            </title>
            <aug>
               <au>
                  <snm>Cai</snm>
                  <fnm>YD</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>XJ</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>XB</fnm>
               </au>
               <au>
                  <snm>Chou</snm>
                  <fnm>KC</fnm>
               </au>
            </aug>
            <source>J Comput Chem</source>
            <pubdate>2002</pubdate>
            <volume>23</volume>
            <fpage>267</fpage>
            <lpage>274</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/jcc.10017</pubid>
                  <pubid idtype="pmpid" link="fulltext">11924738</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Application of support vector machines for T-cell epitopes prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Zhao</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Pinilla</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Valmori</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Martin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Simon</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>1978</fpage>
            <lpage>1984</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg255</pubid>
                  <pubid idtype="pmpid" link="fulltext">14555632</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Support vector machine classification and validation of cancer tissue samples using microarray expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Furey</snm>
                  <fnm>TS</fnm>
               </au>
               <au>
                  <snm>Cristianini</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Duffy</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Bednarski</snm>
                  <fnm>DW</fnm>
               </au>
               <au>
                  <snm>Schummer</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>906</fpage>
            <lpage>914</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/16.10.906</pubid>
                  <pubid idtype="pmpid" link="fulltext">11120680</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Knowledge-based analysis of microarray gene expression data by using support vector machines</p>
            </title>
            <aug>
               <au>
                  <snm>Brown</snm>
                  <fnm>MPS</fnm>
               </au>
               <au>
                  <snm>Grundy</snm>
                  <fnm>WN</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Cristianini</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Sugnet</snm>
                  <fnm>CW</fnm>
               </au>
               <au>
                  <snm>Furey</snm>
                  <fnm>TS</fnm>
               </au>
               <au>
                  <snm>Ares</snm>
                  <fnm>M</fnm>
                  <suf>Jr</suf>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2000</pubdate>
            <volume>97</volume>
            <fpage>262</fpage>
            <lpage>267</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">26651</pubid>
                  <pubid idtype="pmpid" link="fulltext">10618406</pubid>
                  <pubid idtype="doi">10.1073/pnas.97.1.262</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Support vector machine applications in bioinformatics</p>
            </title>
            <aug>
               <au>
                  <snm>Byvatov</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Schneider</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Appl Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>2</volume>
            <fpage>67</fpage>
            <lpage>77</lpage>
            <xrefbib>
               <pubid idtype="pmpid">15130823</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Biological applications of support vector machines</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>ZR</fnm>
               </au>
            </aug>
            <source>Brief Bioinform</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>328</fpage>
            <lpage>338</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bib/5.4.328</pubid>
                  <pubid idtype="pmpid" link="fulltext">15606969</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Granzyme B: A natural born killer</p>
            </title>
            <aug>
               <au>
                  <snm>Lord</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Rajotte</snm>
                  <fnm>RV</fnm>
               </au>
               <au>
                  <snm>Korbutt</snm>
                  <fnm>GS</fnm>
               </au>
               <au>
                  <snm>Bleackley</snm>
                  <fnm>RC</fnm>
               </au>
            </aug>
            <source>Immunol Rev</source>
            <pubdate>2003</pubdate>
            <volume>193</volume>
            <fpage>31</fpage>
            <lpage>38</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1034/j.1600-065X.2003.00044.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">12752668</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Proteolytic cleavage of Livin (ML-IAP) in apoptotic melanoma cells potentially mediated by a non-canonical caspase</p>
            </title>
            <aug>
               <au>
                  <snm>Yan</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Brouha</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Raj</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Biddle</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Grossman</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>J Dermatol Sci</source>
            <pubdate>2006</pubdate>
            <volume>43</volume>
            <fpage>189</fpage>
            <lpage>200</lpage>
            <note>2006 Jun 27</note>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jdermsci.2006.05.007</pubid>
                  <pubid idtype="pmpid" link="fulltext">16806840</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>LIBSVM: a library for support vector machines</p>
            </title>
            <aug>
               <au>
                  <snm>Chang</snm>
                  <fnm>CC</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>CJ</fnm>
               </au>
            </aug>
            <url>http://www.csie.ntu.edu.tw/~cjlin/libsvm</url>
         </bibl>
         <bibl id="B33">
            <title>
               <p>A tutorial on support vector machines for pattern recognition</p>
            </title>
            <aug>
               <au>
                  <snm>Burges</snm>
                  <fnm>CJC</fnm>
               </au>
            </aug>
            <source>Data Mining and Knowledge Discovery</source>
            <pubdate>1998</pubdate>
            <volume>2</volume>
            <fpage>121</fpage>
            <lpage>167</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1023/A:1009715923555</pubid>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
