<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-11-37</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Predicting Bevirimat resistance of HIV-1 from genotype</p>
         </title>
         <aug>
            <au ca="yes" id="A1"><snm>Heider</snm><fnm>Dominik</fnm><insr iid="I1"/><email>dominik.heider@uni-due.de</email></au>
            <au id="A2"><snm>Verheyen</snm><fnm>Jens</fnm><insr iid="I2"/><email>jens.verheyen@uk-koeln.de</email></au>
            <au id="A3"><snm>Hoffmann</snm><fnm>Daniel</fnm><insr iid="I1"/><email>daniel.hoffmann@uni-due.de</email></au>
         </aug>
         <insg>
            <ins id="I1"><p>Department of Bioinformatics, Center of Medical Biotechnology, University of Duisburg-Essen, Universitaetsstr. 2, 45117 Essen, Germany</p></ins>
            <ins id="I2"><p>Institute of Virology, University of Cologne, Fuerst-Pueckler-Str. 56, 50935 Cologne, Germany</p></ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2010</pubdate>
         <volume>11</volume>
         <issue>1</issue>
         <fpage>37</fpage>
         <url>http://www.biomedcentral.com/1471-2105/11/37</url>
         <xrefbib><pubidlist><pubid idtype="pmpid">20089140</pubid><pubid idtype="doi">10.1186/1471-2105-11-37</pubid></pubidlist></xrefbib>
      </bibl>
      <history><rec><date><day>24</day><month>8</month><year>2009</year></date></rec><acc><date><day>20</day><month>1</month><year>2010</year></date></acc><pub><date><day>20</day><month>1</month><year>2010</year></date></pub></history>
      <cpyrt><year>2010</year><collab>Heider et al; licensee BioMed Central Ltd.</collab><note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (&lt;url&gt;http://creativecommons.org/licenses/by/2.0&lt;/url&gt;), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Maturation inhibitors are a new class of antiretroviral drugs. Bevirimat (BVM) was the first substance in this class of inhibitors entering clinical trials. While the inhibitory function of BVM is well established, the molecular mechanisms of action and resistance are not well understood. It is known that mutations in the regions CS p24/p2 and p2 can cause phenotypic resistance to BVM. We have investigated a set of p24/p2 sequences of HIV-1 of known phenotypic resistance to BVM to test whether BVM resistance can be predicted from sequence, and to identify possible molecular mechanisms of BVM resistance in HIV-1.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We used artificial neural networks and random forests with different descriptors for the prediction of BVM resistance. Random forests with hydrophobicity as descriptor performed best and classified the sequences with an area under the Receiver Operating Characteristics (ROC) curve of 0.93 &#177; 0.001. For the collected data we find that p2 sequence positions 369 to 376 have the highest impact on resistance, with positions 370 and 372 being particularly important. These findings are in partial agreement with other recent studies. Apart from the complex machine learning models we derived a number of simple rules that predict BVM resistance from sequence with surprising accuracy. According to computational predictions based on the data set used, cleavage sites are usually not shifted by resistance mutations. However, we found that resistance mutations could shorten and weaken the <it>&#945;</it>-helix in p2, which hints at a possible resistance mechanism.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>We found that BVM resistance of HIV-1 can be predicted well from the sequence of the p2 peptide, which may prove useful for personalized therapy if maturation inhibitors reach clinical practice. Results of secondary structure analysis are compatible with a possible route to BVM resistance in which mutations weaken a six-helix bundle discovered in recent experiments, and thus ease Gag cleavage by the retroviral protease.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <sec>
            <st>
               <p>HIV and Bevirimat</p>
            </st>
            <p>Bevirimat (BVM) <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> belongs to a new class of anti-HIV substances that inhibit maturation of virus particles by preventing cleavage of precursor polyprotein by the retroviral protease (PR). BVM prevents the final cleavage of precursor protein p25 to p24 and p2, hence p25 proteins are accumulating in the immature virions. These immature viral particles are not capable of transforming to an infectious stage, and the viral replication cycle is interrupted. A first set of mutations conferring resistance to BVM were found in selection experiments with BVM and were located at CS p24/p2 <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. In clinical phase II trials, polymorphisms in the QVT-motif of p2 were found to prevent antiretroviral activity of BVM and were extensively studied in phenotypic resistance assays <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Machine learning</p>
            </st>
            <p>The notion of a <it>resistance mutation </it>is often useful as a first, simple approximation to describe relations between point mutations and resistance phenotypes. However, it is often observed that the more data become available the more complex are the relations between genotype and phenotype that show up. For instance, it has been observed that mutations in the QVT motif (wild type sequence 369-371) are preferentially associated with resistance to BVM <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. However, as the data analyzed in the current study shows, the same set of mutations of QVT to QAS can be associated with BVM resistance <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> or susceptibility <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, depending on the mutational background. Machine learning methods are built to cope with such complex associations.</p>
            <p>There are several machine learning methods that have been successfully employed to this end, e.g. rule-based methods <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, decision trees <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>, support vector machines <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, random forests (RFs) <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, or artificial neural networks (ANNs) <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>.</p>
            <p>ANNs are universal approximators that can be used to solve non-linear classification problems; they are prone to overtraining if not properly set up <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>. RFs are also excellent non-linear models, and in general perform better than single decision trees (DTs) <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. They are less easily interpretable than DTs, although they provide variable importance measures <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. In contrast, rule based systems yield rules that are well intelligible, but often classify not optimally <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Data</p>
            </st>
            <p>Sequences of the p24/p2 region of 45 strains of HIV-1 with susceptibility or intermediate resistance to BVM (here defined as <it>IC</it><sub>50 </sub>&#8804; 10) were used, and 110 sequences of resistant strains (<it>IC</it><sub>50 </sub><it>&gt; </it>10). The phenotype was determined in experiments in which HIV-1 was cultured in the presence of increasing concentrations of BVM. The concentration of BVM inhibiting 50% of viral replication compared to cell culture experiments without BVM is defined as <it>IC</it><sub>50 </sub>(50% inhibitor concentration). In general, drug resistance means reduced inhibition of viral replication by antiretroviral drugs, resulting in increased <it>IC</it><sub>50 </sub>values. The <it>IC</it><sub>50 </sub>values of the drug resistant isolates and HIV wild type are used to calculate resistance factors</p>
            <p>
               <display-formula>
                  <graphic file="1471-2105-11-37-i1.gif"/>
               </display-formula>
            </p>
            <p>a standardized measure of HIV drug resistance. The cut-off value of the resistance factor used to define the classes "resistant to BVM" and "susceptible to BVM" was previously derived from data obtained in phase II clinical trials with BVM correlating phenotypic resistance and clinical response <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>.</p>
            <p>All data were collected from several studies that have investigated polymorphisms in p2, especially in its C-terminal half <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp> (see additional file <supplr sid="S1">1</supplr> for complete set). Duplicated sequences in each class were removed prior to analysis.</p>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p><b>Data set</b>. The sequences used in this study.</p>
               </text>
               <file name="1471-2105-11-37-S1.XLS">
   <p>Click here for file</p>
</file>
            </suppl>
            <sec>
               <st>
                  <p>Multiple Sequence Alignment</p>
               </st>
               <p>Multiple sequence alignments of the sequences were produced with clustalw <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, t-coffee <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>, muscle <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, and prank <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. Clustalw and muscle gave very compact alignments with a width of 21 columns and most rows free of gaps. The alignment from t-coffee was wider by one column, and the prank alignment much wider with 36 columns. Since clustalw and muscle gave similar alignments, and the prank alignment led to a relatively poor predictive performance, we restrict ourselves in the following to reporting results based on the output of clustalw and t-coffee (see additional files <supplr sid="S2">2</supplr> and <supplr sid="S3">3</supplr>).</p>
               <suppl id="S2">
                  <title>
                     <p>Additional file 2</p>
                  </title>
                  <text>
                     <p><b>MSA of the sequences with clustalw</b>. Multiple sequence alignment of the sequences with clustalw <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>.</p>
                  </text>
                  <file name="1471-2105-11-37-S2.TXT">
   <p>Click here for file</p>
</file>
               </suppl>
               <suppl id="S3">
                  <title>
                     <p>Additional file 3</p>
                  </title>
                  <text>
                     <p><b>MSA of the sequences with t-coffee</b>. Multiple sequence alignment of the sequences with t-coffee <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>.</p>
                  </text>
                  <file name="1471-2105-11-37-S3.TXT">
   <p>Click here for file</p>
</file>
               </suppl>
            </sec>
            <sec>
               <st>
                  <p>Descriptor set</p>
               </st>
               <p>It is often helpful to analyze not the sequences of amino acids as strings of characters, but to associate with each amino acid a numerical "descriptor" value, for instance a value that captures a physico-chemical property of this amino acid. Recently, it has been shown that the descriptor set is the most critical element in classification <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>, and that physico-chemical descriptors outperform simpler descriptors <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. In our search for a method with maximum predictive power we tested several numerical descriptors, including hydrophobicity values of Kyte and Doolittle <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>, molecular weight, isoelectric point (IEP) and pKa values for each amino acid. Moreover, we used the predicted probability for cleavage by HIV protease as a descriptor <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. The numerical descriptor values for gaps from the multiple sequence alignment are undefined <it>a priori</it>. We therefore tested three values for gaps, namely 0, -1 and an interpolated value (mean of the two amino acid descriptor values on both sides of gap). In the case of 0 and interpolated values for gaps the descriptor values of the amino acids were normalized to the interval [-1,1], and in the case of -1 for a gap they were normalized to [0,1]. Apart from using numerical descriptors, we also trained an RF with the multiply aligned p2 sequences using as factors the single letter codes of the amino acids and "-" for gaps.</p>
            </sec>
            <sec>
               <st>
                  <p>Neural Networks</p>
               </st>
               <p>We used a Java implementation <url>http://www.heatonresearch.com/encog</url> of neural networks with one hidden layer and three to seven hidden neurons. Resilient propagation (Rprop) was applied as a learning rule <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. We used the identity function as activation function for the input layer and the logistic function for the hidden and output layer, respectively. We have used the logistic function because it has been shown in recent studies that it leads to a better generalization ability <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>. The weights of the ANNs were initiated by applying the Nguyen-Widrow-method <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. Stop-training was performed in order to avoid overfitting of the neural networks <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>.</p>
            </sec>
            <sec>
               <st>
                  <p>Random Forests</p>
               </st>
               <p>As an alternative to ANNs we trained Random Forests (RFs) <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> for the prediction of BVM resistance, using the implementation in the randomForest package of R <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. In our application each RF consisted of 2000 randomly and independently grown decision trees. When using the trained RF for prediction, an unseen sequence was assigned to the resistance class voted for by at least 50% of the trees. The importance of each variable, i.e. sequence position, for the correct classification can be assessed by determining the increase in misclassification rate due to leaving this variable <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. Furthermore, we used the rpart package of R <abbrgrp><abbr bid="B37">37</abbr></abbrgrp> to create single decision trees.</p>
            </sec>
            <sec>
               <st>
                  <p>Rule-based systems</p>
               </st>
               <p>We used the rule-based algorithms JRip <abbrgrp><abbr bid="B38">38</abbr></abbrgrp> and PART <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> as implemented in R <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>.</p>
            </sec>
            <sec>
               <st>
                  <p>Cross-validation</p>
               </st>
               <p>All machine learning methods were validated using 100-fold leave-one-out <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> validation to assess for the different machine learning methods the mean prediction sensitivity, specificity, and accuracy (see formulas below) and the ability to generalize to unseen sequences. In addition to this cross-validation, we report for RFs an out-of-bag (OOB) error, an upper limit of the classification error <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>.</p>
               <p>For each test in the cross-validation, the sensitivity, specificity, and accuracy were calculated according to:</p>
               <p>
                  <display-formula>
                     <graphic file="1471-2105-11-37-i2.gif"/>
                  </display-formula>
               </p>
               <p>with true positives <it>TP</it>, false positives <it>FP</it>, false negatives <it>FN </it>and true negatives <it>TN</it>. Figure <figr fid="F1">1</figr> shows sensitivities and specificities as ROC curves (Receiver Operating Characteristics) <abbrgrp><abbr bid="B41">41</abbr></abbrgrp> for the non-discrete methods in our study. Table <tblr tid="T1">1</tblr> gives the corresponding areas under the curve (AUC). ROC curves were drawn with R-package ROCR <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>.</p>
               <tbl id="T1"><title><p>Table 1</p></title><caption><p>Area under the curve. </p></caption><tblbdy cols="5">
      <r>
         <c ca="left">
            <p>
               <b>method</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>descriptor</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>mean AUC</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>sd</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>cv</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>RF</p>
         </c>
         <c ca="left">
            <p>hydrophobicity</p>
         </c>
         <c ca="center">
            <p>0.927</p>
         </c>
         <c ca="center">
            <p>0.001</p>
         </c>
         <c ca="center">
            <p>0.001</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>molecular weight</p>
         </c>
         <c ca="center">
            <p>0.923</p>
         </c>
         <c ca="center">
            <p>0.001</p>
         </c>
         <c ca="center">
            <p>0.001</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>IEP</p>
         </c>
         <c ca="center">
            <p>0.909</p>
         </c>
         <c ca="center">
            <p>0.001</p>
         </c>
         <c ca="center">
            <p>0.001</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>pKa</p>
         </c>
         <c ca="center">
            <p>0.914</p>
         </c>
         <c ca="center">
            <p>0.001</p>
         </c>
         <c ca="center">
            <p>0.001</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>cleavage site prediction</p>
         </c>
         <c ca="center">
            <p>0.851</p>
         </c>
         <c ca="center">
            <p>0.003</p>
         </c>
         <c ca="center">
            <p>0.003</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>ANN</p>
         </c>
         <c ca="left">
            <p>hydrophobicity</p>
         </c>
         <c ca="center">
            <p>0.841</p>
         </c>
         <c ca="center">
            <p>0.028</p>
         </c>
         <c ca="center">
            <p>0.034</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>molecular weight</p>
         </c>
         <c ca="center">
            <p>0.839</p>
         </c>
         <c ca="center">
            <p>0.022</p>
         </c>
         <c ca="center">
            <p>0.026</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>IEP</p>
         </c>
         <c ca="center">
            <p>0.721</p>
         </c>
         <c ca="center">
            <p>0.036</p>
         </c>
         <c ca="center">
            <p>0.050</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>pKa</p>
         </c>
         <c ca="center">
            <p>0.733</p>
         </c>
         <c ca="center">
            <p>0.028</p>
         </c>
         <c ca="center">
            <p>0.038</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>cleavage site prediction</p>
         </c>
         <c ca="center">
            <p>0.762</p>
         </c>
         <c ca="center">
            <p>0.036</p>
         </c>
         <c ca="center">
            <p>0.047</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>linear model</p>
         </c>
         <c ca="left">
            <p>hydrophobicity</p>
         </c>
         <c ca="center">
            <p>0.826</p>
         </c>
         <c ca="center">
            <p>0.008</p>
         </c>
         <c ca="center">
            <p>0.009</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>molecular weight</p>
         </c>
         <c ca="center">
            <p>0.811</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>IEP</p>
         </c>
         <c ca="center">
            <p>0.784</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>pKa</p>
         </c>
         <c ca="center">
            <p>0.777</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>cleavage site prediction</p>
         </c>
         <c ca="center">
            <p>0.803</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>decision tree</p>
         </c>
         <c ca="left">
            <p>hydrophobicity</p>
         </c>
         <c ca="center">
            <p>0.815</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>molecular weight</p>
         </c>
         <c ca="center">
            <p>0.841</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>IEP</p>
         </c>
         <c ca="center">
            <p>0.771</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>pKa</p>
         </c>
         <c ca="center">
            <p>0.764</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>cleavage site prediction</p>
         </c>
         <c ca="center">
            <p>0.803</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>JRip</p>
         </c>
         <c ca="left">
            <p>hydrophobicity</p>
         </c>
         <c ca="center">
            <p>0.825</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>PART</p>
         </c>
         <c ca="left">
            <p>hydrophobicity</p>
         </c>
         <c ca="center">
            <p>0.890</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Rule372</p>
         </c>
         <c ca="left">
            <p>hydrophobicity</p>
         </c>
         <c ca="center">
            <p>0.710</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Results of the 100-fold leave-one-out validation. The <it>pro forma </it>AUC values for the discrete methods (decision trees and rule based models) are just for comparison purposes. sd: standard deviation; cv: coefficient of variation.</p>
   </tblfn></tbl>
               <fig id="F1"><title><p>Figure 1</p></title><caption><p>ROC curve</p></caption><text>
   <p><b>ROC curve</b>. Averaged ROC curves of the best performing descriptor for each (non-discrete) machine learning approach. The standard deviation bars mark the 95% confidence intervals. blue: SVM; green: ANN; red: RF.</p>
</text><graphic file="1471-2105-11-37-1" hint_layout="single"/></fig>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Structure and cleavage-site prediction</p>
            </st>
            <p>Secondary structures of all p2 sequences of 20 or more residues were predicted using JPred <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. Based on statistical evidence, the secondary structure predictions did also yield a reliability index from 0 (unreliable) through 9 (highly reliable) for each residue being in a predicted secondary structure state.</p>
            <p>HIV protease cleavage sites for all p2 sequences were predicted with HIVcleave <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> based on earlier work by Chou <it>et al</it>. <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Statistical comparison</p>
            </st>
            <p>All models were compared by applying Wilcoxon Signed-Rank test <abbrgrp><abbr bid="B45">45</abbr></abbrgrp> on the AUC distributions from the 100-fold leave-one-out cross-validation runs <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. The null hypothesis was that there are no differences between the compared classifiers.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results and Discussion</p>
         </st>
         <sec>
            <st>
               <p>Prediction performance of machine learning methods</p>
            </st>
            <p>All machine learning methods were trained in various configurations and with several descriptors as described in methods. The prediction qualities, such as the mean AUCs (<inline-formula><graphic file="1471-2105-11-37-i3.gif"/></inline-formula>), standard deviation (sd) and coefficient of variation (<it>cv </it>= <inline-formula><graphic file="1471-2105-11-37-i4.gif"/></inline-formula>) are shown in Table <tblr tid="T1">1</tblr>.</p>
            <p>The ANNs yielded AUCs between 0.72 &#177; 0.036 (descriptor IEP) and 0.84 &#177; 0.028 (descriptor hydrophobicity). According to the Wilcoxon Signed-Rank test <abbrgrp><abbr bid="B46">46</abbr></abbrgrp> with significance level <it>&#945; </it>= 0.001 the mean AUC for descriptor molecular weight was not significantly different from that obtained with descriptor hydrophobicity, while all other descriptors gave significantly lower values of mean AUC. There were no significant differences (<it>&#945; </it>= 0.001) between the mean AUCs of each descriptor with regard to the number of hidden neurons.</p>
            <p>RFs performed consistently better than ANNs for all descriptors, reaching AUC values between 0.85 &#177; 0.003 (cleavage site prediction) and 0.93 &#177; 0.001 (hydrophobicity). Again, the best results, with only small differences, were obtained from hydrophobicity and molecular weight as descriptors. The OOB error with this descriptor was 7.59%. For comparison, the best single decision tree, which was created with rpart in R <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>, reached a <it>pro forma </it>AUC of 0.841 (see Table <tblr tid="T1">1</tblr>).</p>
            <p>The RFs find the most important sequence positions for resistance prediction in the second half of the p2 sequence, especially at sequence positions 369-376 (Figure <figr fid="F2">2</figr>) in the clustalw alignment; in the wild type sequence this region corresponds to the motif QVTNSATI. The two positions 370 (V in wild type) and 372 (N in wild type) have by far the highest importance in the investigated data set. This finding is in partial agreement with the findings of other workers who identified the QVT motif at positions 369-371 as important <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. Positions 363 and 364 are not as prominent in terms of importance, although they were previously identified as crucial <abbrgrp><abbr bid="B47">47</abbr></abbrgrp> for resistance to BVM. The apparently lower importance of these positions in the current study can be explained by the nature of our data set, which focuses on resistance mediated by baseline mutations within the p2 region in clinical HIV isolates.</p>
            <fig id="F2"><title><p>Figure 2</p></title><caption><p>Importance of sequence positions in RF predictions</p></caption><text>
   <p><b>Importance of sequence positions in RF predictions</b>. Importance of sequence positions in p2 for prediction of BVM resistance by RFs. The y-axis denotes the "percental increase in misclassification rate" <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. The upper horizontal axis indicates wild type sequence.</p>
</text><graphic file="1471-2105-11-37-2" hint_layout="single"/></fig>
            <p>We also trained RFs on the actual sequences, i.e. without numerical descriptors. These RFs gave OOB errors above those trained with hydrophobicities, namely of 13.55% for the t-coffee alignment and 14.84% for the clustalw alignment. For comparison, other machine learning methods were tested as well, including Hidden Markov Models (HMMs) <abbrgrp><abbr bid="B48">48</abbr></abbrgrp> and linear models. We tested linear support vector machines (SVMs) and logistic regression as implemented in R <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>, and furthermore, simple perceptrons implemented in Java <url>http://www.heatonresearch.com/encog</url>. All of these models performed worse compared to the RFs. The best linear model (AUC 0.826 &#177; 0.008) was a linear SVM using hydrophobicity as a descriptor. In Table <tblr tid="T1">1</tblr> we report the results of the best linear model for each descriptor. The HMMs were not able to classify the sequences. In Figure <figr fid="F1">1</figr> the ROC-curves for the descriptors performing best for each (non-discrete) machine learning method are shown.</p>
         </sec>
         <sec>
            <st>
               <p>Genotype-phenotype rules</p>
            </st>
            <p>The two algorithms JRip and PART for rule extraction provided rule sets that performed well in the cross-validation with accuracies reaching almost that of RFs. Since the rules derived from the t-coffee alignment had lower errors than those based on the clustalw alignment, we here report only the former rules.</p>
            <p>During cross-validation JRip generated most frequently a set of three rules relating alignment positions, hydrophobicities, and BVM resistance class. Translated to amino acid residues, the rules are:</p>
            <p indent="1">1. IF position 370 &#8712; {<it>I</it>, <it>V</it>} AND NOT position 373 &#8712; {<it>K</it>, <it>R</it>} AND position 374 &#8712; {<it>I</it>, <it>V</it>, <it>L</it>, <it>F</it>, <it>C</it>, <it>M</it>, <it>A</it>} &#8658; susceptible</p>
            <p indent="1">2. IF position 372 &#8712; {<it>K</it>, <it>R</it>} AND position 373 &#8712; {<it>P</it>, <it>H</it>, <it>E</it>, <it>N</it>, <it>Q</it>, <it>D</it>} &#8658; susceptible</p>
            <p indent="1">3. ELSE resistant</p>
            <p>JRip reaches in the cross-validation a mean sensitivity of 77.01% at a specificity of 88.14%. Dropping the first rule leads to a sensitivity of 11.76% and a specificity of 99.21%. Dropping the second rule leads to a sensitivity of 72.54% with a corresponding specificity of 88.1%.</p>
            <p>In the cross-validation PART most frequently extracted fifteen rules (see additional file <supplr sid="S4">4</supplr>) with a sensitivity of 85.5% and a specificity of 93.27%. Remarkably, the PART rules did take exactly those sequence positions into account that had non-zero importance in the RF analysis (see Figure <figr fid="F2">2</figr>). As suggested by the JRip and PART rules, resistance is generally caused by patterns of two or more residues. However, the importance plot (Figure <figr fid="F2">2</figr>) show that single positions may be useful indicators as well. E.g. we found that at sequence position 372 the hydrophobicity values of resistant and susceptible group clustered around two different values, 0.39 for the resistant and 0.26 for the susceptible. From this we could derive the rule (Rule372): a sequence is resistant if the hydrophobicity at 372 is closer to the mean hydrophobicity of the resistant cluster than to that of the susceptible cluster and vice versa. The rule is predictive with 52% sensitivity and 90% specificity.</p>
            <suppl id="S4">
               <title>
                  <p>Additional file 4</p>
               </title>
               <text>
                  <p><b>Plots and rules</b>. Variance plots and prediction rules.</p>
               </text>
               <file name="1471-2105-11-37-S4.DOC">
   <p>Click here for file</p>
</file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Structural and functional implications of resistance mutations</p>
            </st>
            <p>After experiments <abbrgrp><abbr bid="B49">49</abbr></abbrgrp> have excluded the classical molecular mechanism of protease inhibition, i.e. blocking of its catalytic site, there are still several molecular mechanisms for BVM action considered in the literature (for review see <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>): BVM could directly occlude the protease cleavage site ("direct" mechanisms, possibly with contact of BVM and protease), or it could stabilize a Gag structure that has to be weakened or dissolved to make the cleavage sites accessible to the protease ("indirect" mechanisms, possibly <it>without </it>contact of BVM and protease). Accordingly, there are several possible resistance mechanisms discussed in the literature, such as mutations that perturb the BVM binding site, that weaken the mentioned Gag structure, or that make the affected cleavage site easier digestible for the protease. A hypothetical resistance mechanism that to our knowledge so far has not been addressed is a shift of the cleavage site. We have therefore investigated associations of resistance mutations with cleavage site locations properties, as predicted computationally. In all susceptible and most resistant sequences the predicted PR cleavage sites with maximum probability were unchanged with respect to the wild type (see additional file <supplr sid="S5">5</supplr>): cleavage was predicted to be most probable at P<sub>1</sub>-sites 363 and 367 in agreement with experimental findings <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>, and cleavage probabilities at P<sub>1 </sub>363 were rather invariable across the data set. In a few resistant sequences cleavage sites probabilities were indeed predicted to shift (see additional file <supplr sid="S6">6</supplr>). Amongst these sequences we observed a tendency for the second cleavage site at P<sub>1 </sub>367 to have lower probabilities whereas position 365 did emerge as a new possible P<sub>1 </sub>site. However, since this occurs rather rarely, the data do not support a shift of the cleavage site as a major resistance mechanism. It is notable that in the studied data the positions 372-376 most relevant for resistance (Figure <figr fid="F2">2</figr>) lie outside the protease binding region P<sub>4</sub>-P<sub>4</sub>' for P1 at 363 (P<sub>4</sub>' 367). Even for the internal cleavage site at P<sub>1 </sub>367 (P<sub>4</sub>' 371), more than half of these important positions are outside the protease binding site. This finding is consistent with a model that allows for an "indirect" mechanism of BVM, though it cannot exclude "direct" mechanisms. In fact, mutations found in other studies closer to the cleavage sites <abbrgrp><abbr bid="B47">47</abbr><abbr bid="B49">49</abbr></abbrgrp> also allow for a direct model.</p>
            <suppl id="S5">
               <title>
                  <p>Additional file 5</p>
               </title>
               <text>
                  <p><b>Cleavage site predictions</b>. Predictions are made with HIVcleave <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>.</p>
               </text>
               <file name="1471-2105-11-37-S5.XLS">
   <p>Click here for file</p>
</file>
            </suppl>
            <suppl id="S6">
               <title>
                  <p>Additional file 6</p>
               </title>
               <text>
                  <p><b>Shifted cleavage site probabilities</b>. Probable HIV-protease cleavage sites are shown in bold <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. The value represents the probability of protease cleavage.</p>
               </text>
               <file name="1471-2105-11-37-S6.XLS">
   <p>Click here for file</p>
</file>
            </suppl>
            <p>A key component of an indirect mechanism is a structure within Gag that has to be weakened prior to cleavage of p24/p2. A candidate structure is the <it>&#945;</it>-helix first predicted by Accola <it>et al</it>. <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>. We have extended secondary structure predictions to all sequences of the data set, including the wild type. All these structures were predicted as mainly <it>&#945;</it>-helical in the central part (additional file <supplr sid="S7">7</supplr>). This gross feature is consistent with the experimental structure by Morellet <it>et al</it>. <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>, though the predicted helices are shorter. While in the Morellet structure the helix comprises all of the residues starting at position 358, the predicted helices comprise between seven and twelve of the 21 sequence positions and typically start at position 361 (Figure <figr fid="F3">3A</figr>). Apart from the deficiencies of the prediction method the difference between experiment and prediction may be due in part to the experimental conditions <abbrgrp><abbr bid="B51">51</abbr></abbrgrp> where a substantial amount of trifluoroethanol in the solution could have led to a helix content exceeding that in the native state. The earlier work by Worthylake <it>et al</it>. <abbrgrp><abbr bid="B52">52</abbr></abbrgrp> supports the view that the helix formed by p2 as such is not very stable. A very stable helix at the cleavage site could possibly prevent PR from cleaving, because the protease requires its substrate in an extended conformation <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>. On the other hand, recent data from electron microscopy <abbrgrp><abbr bid="B54">54</abbr></abbrgrp> are compatible with bundles of six p2 helices stabilizing the immature matrix of the virus. In summary predictions and experiments point to a weak p2 helix that is stabilized by its environment.</p>
            <suppl id="S7">
               <title>
                  <p>Additional file 7</p>
               </title>
               <text>
                  <p><b>Secondary structure predictions</b>. Predictions are made with JPred <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>.</p>
               </text>
               <file name="1471-2105-11-37-S7.XLS">
   <p>Click here for file</p>
</file>
            </suppl>
            <fig id="F3"><title><p>Figure 3</p></title><caption><p>Helix length and confidence</p></caption><text>
   <p><b>Helix length and confidence</b>. Secondary structure predictions for p2 with susceptibility and resistance to BVM. A: For all p2 sequences the secondary structure was predicted by JPred <abbrgrp><abbr bid="B43">43</abbr></abbrgrp> and then for each sequence position the helix probability (fraction of helix at this position) was computed separately for the susceptible and resistant sequences. B: Histograms of the confidence with which JPred predicts a helix (0 lowest, 9 highest). The confidence values were averaged for each sequence over all positions predicted to be helical.</p>
</text><graphic file="1471-2105-11-37-3" hint_layout="single"/></fig>
            <p>It is remarkable that the end of the predicted helices around position 369 coincides with start of the sequence region most important for resistance (Figure <figr fid="F2">2</figr>) in our data set; in other words, the sequence positions most important for resistance in our data lie outside the predicted <it>&#945;</it>-helix in a region of unspecified secondary structure. Moreover, the resistant sequences have a tendency for shorter helices compared to susceptible sequences, as can be seen in the earlier drop of helix probability at around position 368 in Figure <figr fid="F3">3A</figr>.</p>
            <p>We have also analyzed the confidence with which the secondary structure prediction algorithm assigns residues to a helical state. If we assume that the prediction is based on a representative sample of sequences observed as helices and non-helices, respectively, then this confidence could have a positive correlation with helix stability. A comparison of resistant and susceptible sequences with respect to mean confidence along the helix shows that resistant sequences have a tendency to lower helix confidence, and, if the assumption holds, lower helix stability (Figure <figr fid="F3">3B</figr>).</p>
            <p>The above tendencies of resistance class, predicted helix length and confidence may reflect possible "indirect" resistance mechanisms: shorter and weaker helices could limit the effect of BVM in several ways, e.g. by destabilizing the binding site of BVM that may lie on the six-helix bundle mentioned above, or by easing the unraveling of the remaining helix, and thus cleavage by PR in presence of BVM. This argument suggests to test whether helix length and helix confidence are predictive of resistance. We have therefore trained another random forest solely with the predicted helix lengths and confidences, and without reference to the detailed sequences. This random forest had an OOB error of 23%, which is not as good as the errors of 8-15% reported above for random forests or rule-based methods trained on sequence information, but still much better than random guessing. This means that tuning of helicality of p2 could indeed be a BVM resistance mechanism.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>BVM was the first drug of the new class of maturation-inhibitors of HIV-1 that has reached phase II clinical trials. Several polymorphisms in p2 of HIV-1 hampered the sustained suppression of viral replication in these patients and conferred phenotypic resistance <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. Since these polymorphisms were found in about 30% of treatment na&#239;ve HIV isolates and were significantly accumulated in PI resistant HIV isolates <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>, genotypic resistance testing seems to be mandatory before administration of BVM.</p>
         <p>Our analysis has shown that with the available sequences and corresponding phenotypic data it is possible to train machine learning methods that predict phenotypic resistance to BVM, mediated by baseline mutations of the p2 region, for unseen sequences with an error of less than 10%. This result is compatible with the view that mutations in p2 are the main reason for BVM resistance observed in clinical isolates not responding to BVM in clinical phase I and II studies. The high classification accuracy is encouraging for personalized therapy based on genotypical testing in case BVM-like drugs will become part of the antiretroviral repertoire. With a larger, representative data set of genotype-phenotype pairs, it could become feasible to use machine learning methods not only for classification but also for regression, i.e. prediction of quantitative resistance factors.</p>
         <p>Random forests, the method with the best classification results amongst those tested, also allowed for the identification of the sequence positions most relevant for resistance. In our data set, these sequence positions cluster in the C-terminal half of p<sub>2</sub>, mostly outside the P<sub>4</sub>-P<sub>4</sub>' protease binding region. This is in agreement with the outcome of rule-based methods.</p>
         <p>As judged from predicted cleavage positions, resistance mutations do usually not shift the cleavage site. Secondary structure prediction shows that resistance mutations may affect the length and strength of the <it>&#945;</it>-helix formed by at least sequence positions 371-377 and covering also the cleavage site. This hypothesis is in agreement with propositions by other workers <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> and suggests possible resistance mechanisms that also may occur in combination, e.g. (a) resistance mutations could destroy the BVM binding site that may lie in the C-terminal half of p2, formed by several p2 peptides in the six-helix bundle suggested by Wright <it>et al</it>. <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>; (b) resistance mutations could weaken the <it>&#945;</it>-helix in p2, and thus, the six-helix bundle in the immature virus. This could ease unraveling of the helix prior to cleavage by PR, and hence, may functionally outweigh a stabilizing effect of BVM on the helix bundle.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>All authors have jointly developed the research concept and collaborated on the writing of the manuscript. DH* has carried out computational analyses and drafted the manuscript. JV has initiated the study and revised the manuscript. DH has interpreted results and revised the manuscript.</p>
         <p>All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The authors thank J. Nikolaj Dybowski for the assistance and helpful discussions. We also thank Dr. Nelly Morellet (Universit&#232; Paris-Descartes, Paris, France) for the p24/p2 structural model. This work was supported by the Deutsche Forschungsgemeinschaft (SFB/Transregio 60).</p>
         </sec>
      </ack>
      <refgrp><bibl id="B1"><title><p>Maturation inhibitors: a new therapeutic class targets the virus structure</p></title><aug><au><snm>Salzwedel</snm><fnm>K</fnm></au><au><snm>Martin</snm><fnm>D</fnm></au><au><snm>Sakalian</snm><fnm>M</fnm></au></aug><source>AIDS Rev</source><pubdate>2007</pubdate><volume>9</volume><fpage>162</fpage><lpage>172</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">17982941</pubid></xrefbib></bibl><bibl id="B2"><title><p>In vitro resistance to the human immunodeficiency virus type 1 maturation inhibitor PA-457 (Bevirimat)</p></title><aug><au><snm>Adamson</snm><fnm>CS</fnm></au><au><snm>Ablan</snm><fnm>SD</fnm></au><au><snm>Boeras</snm><fnm>I</fnm></au><au><snm>Goila-Gaur</snm><fnm>R</fnm></au><au><snm>Soheilian</snm><fnm>F</fnm></au><au><snm>Nagashima</snm><fnm>K</fnm></au><au><snm>Li</snm><fnm>F</fnm></au><au><snm>Salzwedel</snm><fnm>K</fnm></au><au><snm>Sakalian</snm><fnm>M</fnm></au><au><snm>Wild</snm><fnm>CT</fnm></au><au><snm>Freed</snm><fnm>EO</fnm></au></aug><source>J Virol</source><pubdate>2006</pubdate><volume>80</volume><issue>22</issue><fpage>10957</fpage><lpage>10971</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/JVI.01369-06</pubid><pubid idtype="pmcid">1642185</pubid><pubid idtype="pmpid" link="fulltext">16956950</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>Determinants of activity of the HIV-1 maturation inhibitor PA-457</p></title><aug><au><snm>Li</snm><fnm>F</fnm></au><au><snm>Zoumplis</snm><fnm>D</fnm></au><au><snm>Matallana</snm><fnm>C</fnm></au><au><snm>Kilgore</snm><fnm>N</fnm></au><au><snm>Reddick</snm><fnm>M</fnm></au><au><snm>Yunus</snm><fnm>A</fnm></au><au><snm>Adamson</snm><fnm>C</fnm></au><au><snm>Salzwedel</snm><fnm>K</fnm></au><au><snm>Martin</snm><fnm>D</fnm></au><au><snm>Allaway</snm><fnm>G</fnm></au><au><snm>Freed</snm><fnm>E</fnm></au><au><snm>Wild</snm><fnm>C</fnm></au></aug><source>Virology</source><pubdate>2006</pubdate><volume>356</volume><fpage>217</fpage><lpage>24</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.virol.2006.07.023</pubid><pubid idtype="pmpid" link="fulltext">16930665</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>Impact of human immunodeficiency virus type 1 resistance to protease inhibitors on evolution of resistance to the maturation inhibitor bevirimat (PA-457)</p></title><aug><au><snm>Adamson</snm><fnm>CS</fnm></au><au><snm>Waki</snm><fnm>K</fnm></au><au><snm>Ablan</snm><fnm>SD</fnm></au><au><snm>Salzwedel</snm><fnm>K</fnm></au><au><snm>Freed</snm><fnm>EO</fnm></au></aug><source>J Virol</source><pubdate>2009</pubdate><volume>83</volume><issue>10</issue><fpage>4884</fpage><lpage>4894</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/JVI.02659-08</pubid><pubid idtype="pmcid">2682084</pubid><pubid idtype="pmpid" link="fulltext">19279107</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>Phenotypic susceptibility to Bevirimat among HIV-infected patient isolates without prior exposure to Bevirimat</p></title><aug><au><snm>Margot</snm><fnm>N</fnm></au><au><snm>Gibbs</snm><fnm>C</fnm></au><au><snm>Miller</snm><fnm>M</fnm></au></aug><source>Proceedings of the 16th Conference on Retroviruses and Opportunistic Infections, Montreal, Canada</source><pubdate>2009</pubdate></bibl><bibl id="B6"><title><p>Susceptibility of diverse HIV-1 patient isolates to the maturation inhibitor, Bevirimat (MPC-4326), is determined by clade-specific polymorphisms in Gag CA-SP1</p></title><aug><au><snm>Salzwedel</snm><fnm>K</fnm></au><au><snm>Harmy</snm><fnm>F</fnm></au><au><snm>Louvel</snm><fnm>S</fnm></au><au><snm>Sakalian</snm><fnm>M</fnm></au><au><snm>Reddick</snm><fnm>M</fnm></au><au><snm>Finnegan</snm><fnm>C</fnm></au><au><snm>Martin</snm><fnm>D</fnm></au><au><snm>McCallister</snm><fnm>S</fnm></au><au><snm>Klimkait</snm><fnm>T</fnm></au><au><snm>Allaway</snm><fnm>G</fnm></au></aug><source>Proceedings of the 16th Conference on Retroviruses and Opportunistic Infections, Montreal, Canada</source><pubdate>2009</pubdate></bibl><bibl id="B7"><title><p>Susceptibility of human immunodeficiency virus type 1 to the maturation inhibitor bevirimat is modulated by baseline polymorphisms in Gag spacer peptide 1</p></title><aug><au><snm>Baelen</snm><fnm>KV</fnm></au><au><snm>Salzwedel</snm><fnm>K</fnm></au><au><snm>Rondelez</snm><fnm>E</fnm></au><au><snm>Eygen</snm><fnm>VV</fnm></au><au><snm>Vos</snm><fnm>SD</fnm></au><au><snm>Verheyen</snm><fnm>A</fnm></au><au><snm>Steegen</snm><fnm>K</fnm></au><au><snm>Verlinden</snm><fnm>Y</fnm></au><au><snm>Allaway</snm><fnm>GP</fnm></au><au><snm>Stuyver</snm><fnm>LJ</fnm></au></aug><source>Antimicrob Agents Chemother</source><pubdate>2009</pubdate><volume>53</volume><fpage>2185</fpage><lpage>2188</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/AAC.01650-08</pubid><pubid idtype="pmcid">2681549</pubid><pubid idtype="pmpid" link="fulltext">19223634</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>HIV-1 Gag polymorphisms determine treatment response to bevirimat (PA-457)</p></title><aug><au><snm>McCallister</snm><fnm>S</fnm></au><au><snm>Lalezari</snm><fnm>J</fnm></au><au><snm>Richmond</snm><fnm>G</fnm></au><au><snm>Thompson</snm><fnm>M</fnm></au><au><snm>Harrigan</snm><fnm>R</fnm></au><au><snm>Martin</snm><fnm>D</fnm></au><au><snm>Salzwedel</snm><fnm>K</fnm></au><au><snm>Allaway</snm><fnm>G</fnm></au></aug><source>Antivir Ther</source><pubdate>2008</pubdate><volume>13</volume><issue>Suppl 3</issue><fpage>A10</fpage></bibl><bibl id="B9"><title><p>Knowledge-based avoidance of drug-resistant HIV mutants</p></title><aug><au><snm>Lathrop</snm><fnm>R</fnm></au><au><snm>Steffen</snm><fnm>N</fnm></au><au><snm>Raphael</snm><fnm>M</fnm></au><au><snm>Deeds-Rubin</snm><fnm>S</fnm></au><au><snm>Pazzani</snm><fnm>M</fnm></au><au><snm>Cimoch</snm><fnm>P</fnm></au><au><snm>See</snm><fnm>D</fnm></au><au><snm>Tilles</snm><fnm>J</fnm></au></aug><source>AI MAGAZINE</source><pubdate>1999</pubdate><volume>20</volume><issue>1</issue><fpage>13</fpage><lpage>25</lpage></bibl><bibl id="B10"><title><p>Methods for Investigation of the Relationship between Drug-Susceptibility Phenotype and Human Immunodeficiency Virus Type 1 Genotype with Applications to AIDS Clinical Trials Group 333</p></title><aug><au><snm>Sevin</snm><fnm>AD</fnm></au><au><snm>DeGruttola</snm><fnm>V</fnm></au><au><snm>Nijhuis</snm><fnm>M</fnm></au><au><snm>Schapiro</snm><fnm>JM</fnm></au><au><snm>Foulkes</snm><fnm>AS</fnm></au><au><snm>Para</snm><fnm>MF</fnm></au><au><snm>Boucher</snm><fnm>CAB</fnm></au></aug><source>J Infect Dis</source><pubdate>2000</pubdate><volume>182</volume><fpage>59</fpage><lpage>67</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1086/315673</pubid><pubid idtype="pmpid" link="fulltext">10882582</pubid></pubidlist></xrefbib></bibl><bibl id="B11"><title><p>Diversity and complexity of HIV-1 drug resistance: a bioinformatics approach to predicting phenotype from genotype</p></title><aug><au><snm>Beerenwinkel</snm><fnm>N</fnm></au><au><snm>Schmidt</snm><fnm>B</fnm></au><au><snm>Walter</snm><fnm>H</fnm></au><au><snm>Kaiser</snm><fnm>R</fnm></au><au><snm>Lengauer</snm><fnm>T</fnm></au><au><snm>Hoffmann</snm><fnm>D</fnm></au><au><snm>Korn</snm><fnm>K</fnm></au><au><snm>Selbig</snm><fnm>J</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2002</pubdate><volume>99</volume><issue>12</issue><fpage>8271</fpage><lpage>8276</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.112177799</pubid><pubid idtype="pmcid">123057</pubid><pubid idtype="pmpid" link="fulltext">12060770</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>Geno2pheno: Interpreting Genotypic HIV Drug Resistance Tests</p></title><aug><au><snm>Beerenwinkel</snm><fnm>N</fnm></au><au><snm>Schmidt</snm><fnm>B</fnm></au><au><snm>Walter</snm><fnm>H</fnm></au><au><snm>Kaiser</snm><fnm>R</fnm></au><au><snm>Lengauer</snm><fnm>T</fnm></au><au><snm>Hoffmann</snm><fnm>D</fnm></au><au><snm>Korn</snm><fnm>K</fnm></au><au><snm>Selbig</snm><fnm>J</fnm></au></aug><source>IEEE Intelligent Systems</source><pubdate>2001</pubdate><volume>16</volume><fpage>35</fpage><lpage>41</lpage><xrefbib><pubid idtype="doi">10.1109/5254.972080</pubid></xrefbib></bibl><bibl id="B13"><title><p>Genetic basis of variation in tenofovir drug susceptibility in HIV-1</p></title><aug><au><snm>Murray</snm><fnm>RJ</fnm></au><au><snm>Lewis</snm><fnm>FI</fnm></au><au><snm>Miller</snm><fnm>MD</fnm></au><au><snm>Brown</snm><fnm>AJ</fnm></au></aug><source>AIDS</source><pubdate>2008</pubdate><volume>22</volume><issue>10</issue><fpage>1113</fpage><lpage>23</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1097/QAD.0b013e32830184a1</pubid><pubid idtype="pmpid" link="fulltext">18525256</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>Improved success of phenotype prediction of the human immunodeficiency virus type 1 from envelope variable loop 3 sequence using neural networks</p></title><aug><au><snm>Resch</snm><fnm>W</fnm></au><au><snm>Hoffman</snm><fnm>N</fnm></au><au><snm>Swanstrom</snm><fnm>R</fnm></au></aug><source>Virology</source><pubdate>2001</pubdate><volume>288</volume><fpage>51</fpage><lpage>62</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1006/viro.2001.1087</pubid><pubid idtype="pmpid" link="fulltext">11543657</pubid></pubidlist></xrefbib></bibl><bibl id="B15"><title><p>Predicting HIV drug resistance with neural networks</p></title><aug><au><snm>Draghici</snm><fnm>S</fnm></au><au><snm>Potter</snm><fnm>RB</fnm></au></aug><source>Bioinformatics</source><pubdate>2003</pubdate><volume>19</volume><fpage>98</fpage><lpage>107</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/19.1.98</pubid><pubid idtype="pmpid" link="fulltext">12499299</pubid></pubidlist></xrefbib></bibl><bibl id="B16"><title><p>Enhanced prediction of lopinavir resistance from genotype by use of artificial neural networks</p></title><aug><au><snm>Wang</snm><fnm>D</fnm></au><au><snm>Larder</snm><fnm>B</fnm></au></aug><source>J Infect Dis</source><pubdate>2003</pubdate><volume>188</volume><issue>5</issue><fpage>653</fpage><lpage>660</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1086/377453</pubid><pubid idtype="pmpid" link="fulltext">12934180</pubid></pubidlist></xrefbib></bibl><bibl id="B17"><title><p>Comparison of classification algorithms on large real-world problems</p></title><aug><au><snm>King</snm><fnm>R</fnm></au><au><snm>Feng</snm><fnm>C</fnm></au><au><snm>Sutherland</snm><fnm>A</fnm></au></aug><source>Applied Artificial Intelligence</source><pubdate>1995</pubdate><volume>9</volume><issue>3</issue><fpage>259</fpage><lpage>287</lpage><xrefbib><pubid idtype="doi">10.1080/08839519508945477</pubid></xrefbib></bibl><bibl id="B18"><title><p>On the overtraining phenomenon of backpropagation neural networks</p></title><aug><au><snm>Tzafestas</snm><fnm>S</fnm></au><au><snm>Dalianis</snm><fnm>PJ</fnm></au><au><snm>Anthopoulos</snm><fnm>G</fnm></au></aug><source>Mathematics and computers in simulation</source><pubdate>1996</pubdate><volume>40</volume><fpage>505</fpage><lpage>663</lpage><xrefbib><pubid idtype="doi">10.1016/0378-4754(96)90015-4</pubid></xrefbib></bibl><bibl id="B19"><title><p>A comparison of decision tree ensemble creation techniques</p></title><aug><au><snm>Banfield</snm><fnm>RE</fnm></au><au><snm>Hall</snm><fnm>LO</fnm></au><au><snm>Bowyer</snm><fnm>KW</fnm></au><au><snm>Kegelmeyer</snm><fnm>WP</fnm></au></aug><source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source><pubdate>2007</pubdate><volume>29</volume><issue>1</issue><fpage>173</fpage><lpage>180</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1109/TPAMI.2007.250609</pubid><pubid idtype="pmpid" link="fulltext">17108393</pubid></pubidlist></xrefbib></bibl><bibl id="B20"><title><p>Random Forests</p></title><aug><au><snm>Breiman</snm><fnm>L</fnm></au></aug><source>Machine Learning</source><pubdate>2001</pubdate><volume>45</volume><fpage>5</fpage><lpage>32</lpage><xrefbib><pubid idtype="doi">10.1023/A:1010933404324</pubid></xrefbib></bibl><bibl id="B21"><title><p>Rule-based expert systems and beyond: an overview</p></title><aug><au><snm>Kingston</snm><fnm>J</fnm></au></aug><source>British Association of Accountants&apos; Conference</source><pubdate>1987</pubdate></bibl><bibl id="B22"><aug><au><snm>Witten</snm><fnm>IH</fnm></au><au><snm>Frank</snm><fnm>E</fnm></au></aug><source>Data Mining. Morgan Kauffmann</source><pubdate>2000</pubdate></bibl><bibl id="B23"><title><p>CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice</p></title><aug><au><snm>Thompson</snm><fnm>J</fnm></au><au><snm>Higgins</snm><fnm>D</fnm></au><au><snm>Gibson</snm><fnm>T</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>1994</pubdate><volume>22</volume><fpage>4673</fpage><lpage>4680</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/22.22.4673</pubid><pubid idtype="pmcid">308517</pubid><pubid idtype="pmpid" link="fulltext">7984417</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>T-Coffee: A novel method for fast and accurate multiple sequence alignment</p></title><aug><au><snm>Notredame</snm><fnm>C</fnm></au><au><snm>Higgins</snm><fnm>DG</fnm></au><au><snm>Heringa</snm><fnm>J</fnm></au></aug><source>J Mol Biol</source><pubdate>2000</pubdate><volume>302</volume><fpage>205</fpage><lpage>217</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1006/jmbi.2000.4042</pubid><pubid idtype="pmpid" link="fulltext">10964570</pubid></pubidlist></xrefbib></bibl><bibl id="B25"><title><p>MUSCLE: a multiple sequence alignment method with reduced time and space complexity</p></title><aug><au><snm>Edgar</snm><fnm>RC</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2004</pubdate><volume>5</volume><fpage>113</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-5-113</pubid><pubid idtype="pmcid">517706</pubid><pubid idtype="pmpid" link="fulltext">15318951</pubid></pubidlist></xrefbib></bibl><bibl id="B26"><title><p>An algorithm for progressive multiple alignment of sequences with insertions</p></title><aug><au><snm>L&#246;ytynoja</snm><fnm>A</fnm></au><au><snm>Goldman</snm><fnm>N</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2005</pubdate><volume>102</volume><issue>30</issue><fpage>10557</fpage><lpage>10562</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.0409137102</pubid><pubid idtype="pmcid">1180752</pubid><pubid idtype="pmpid" link="fulltext">16000407</pubid></pubidlist></xrefbib></bibl><bibl id="B27"><title><p>Efficacy of different protein descriptors in predicting protein functional families</p></title><aug><au><snm>Ong</snm><fnm>S</fnm></au><au><snm>Lin</snm><fnm>H</fnm></au><au><snm>Chen</snm><fnm>Y</fnm></au><au><snm>Li</snm><fnm>Z</fnm></au><au><snm>Cao</snm><fnm>Z</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2007</pubdate><volume>8</volume><fpage>300</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-8-300</pubid><pubid idtype="pmcid">1997217</pubid><pubid idtype="pmpid" link="fulltext">17705863</pubid></pubidlist></xrefbib></bibl><bibl id="B28"><title><p>Using genetic algorithms to select most predictive protein features</p></title><aug><au><snm>Kernytsky</snm><fnm>A</fnm></au><au><snm>Rost</snm><fnm>B</fnm></au></aug><source>Proteins</source><pubdate>2009</pubdate><volume>75</volume><fpage>75</fpage><lpage>88</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1002/prot.22211</pubid><pubid idtype="pmpid" link="fulltext">18798568</pubid></pubidlist></xrefbib></bibl><bibl id="B29"><title><p>Using ensembles of classifiers for predicting HIV protease cleavage sites in proteins</p></title><aug><au><snm>Nanni</snm><fnm>L</fnm></au><au><snm>Lumini</snm><fnm>A</fnm></au></aug><source>Amino Acids</source><pubdate>2009</pubdate><volume>36</volume><fpage>409</fpage><lpage>416</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1007/s00726-008-0076-z</pubid><pubid idtype="pmpid" link="fulltext">18401541</pubid></pubidlist></xrefbib></bibl><bibl id="B30"><title><p>A simple method for displaying the hydropathic character of a protein</p></title><aug><au><snm>Kyte</snm><fnm>J</fnm></au><au><snm>Doolittle</snm><fnm>R</fnm></au></aug><source>J Mol Biol</source><pubdate>1982</pubdate><volume>157</volume><fpage>105</fpage><lpage>132</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/0022-2836(82)90515-0</pubid><pubid idtype="pmpid" link="fulltext">7108955</pubid></pubidlist></xrefbib></bibl><bibl id="B31"><title><p>HIVcleave: a web-server for predicting human immunodeficiency virus protease cleavage sites in proteins</p></title><aug><au><snm>Shen</snm><fnm>HB</fnm></au><au><snm>Chou</snm><fnm>KC</fnm></au></aug><source>Analytical Biochemistry</source><pubdate>2008</pubdate><volume>375</volume><fpage>388</fpage><lpage>390</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.ab.2008.01.012</pubid><pubid idtype="pmpid" link="fulltext">18249180</pubid></pubidlist></xrefbib></bibl><bibl id="B32"><title><p>A direct adaptive method for faster backpropagation learning: The Rprop algorithm</p></title><aug><au><snm>Riedmiller</snm><fnm>M</fnm></au><au><snm>Braun</snm><fnm>H</fnm></au></aug><source>Proceedings of the IEEE International Conference on Neural Networks</source><pubdate>1993</pubdate><fpage>586</fpage><lpage>591</lpage><xrefbib><pubid idtype="doi">full_text</pubid></xrefbib></bibl><bibl id="B33"><title><p>Opportunities and limitations of a principal component analysis optimized machine learning approach for the identification and classification of cancer involved proteins</p></title><aug><au><snm>Borschbach</snm><fnm>M</fnm></au><au><snm>Hauke</snm><fnm>S</fnm></au><au><snm>Pyka</snm><fnm>M</fnm></au><au><snm>Heider</snm><fnm>D</fnm></au></aug><source>Communications of the SIWN</source><pubdate>2009</pubdate><volume>6</volume><fpage>85</fpage><lpage>89</lpage></bibl><bibl id="B34"><title><p>A computational approach for the identification of small GTPases based on preprocessed amino acid sequences</p></title><aug><au><snm>Heider</snm><fnm>D</fnm></au><au><snm>Appelmann</snm><fnm>J</fnm></au><au><snm>Bayro</snm><fnm>T</fnm></au><au><snm>Dreckmann</snm><fnm>W</fnm></au><au><snm>Held</snm><fnm>A</fnm></au><au><snm>Winkler</snm><fnm>J</fnm></au><au><snm>Barnekow</snm><fnm>A</fnm></au><au><snm>Borschbach</snm><fnm>M</fnm></au></aug><source>Technology in Cancer Research and Treatment</source><pubdate>2009</pubdate><volume>8</volume><issue>5</issue><fpage>333</fpage><lpage>342</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">19754209</pubid></xrefbib></bibl><bibl id="B35"><title><p>Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights</p></title><aug><au><snm>Nguyen</snm><fnm>D</fnm></au><au><snm>Widrow</snm><fnm>B</fnm></au></aug><source>Proceedings of Intl Joint Conf on Neural Networks</source><pubdate>1990</pubdate><fpage>21</fpage><lpage>26</lpage><xrefbib><pubid idtype="doi">full_text</pubid></xrefbib></bibl><bibl id="B36"><aug><au><snm>Punta</snm><fnm>M</fnm></au><au><snm>Rost</snm><fnm>B</fnm></au></aug><source>Neural networks predict protein structure and function</source><publisher>Humana Press, Berlin, Germany 2008 chap. Artificial Neural Networks: Methods and Protocols</publisher></bibl><bibl id="B37"><aug><au><cnm>R Development Core Team</cnm></au></aug><source>R: A Language and Environment for Statistical Computing</source><publisher>R Foundation for Statistical Computing, Vienna, Austria</publisher><pubdate>2006</pubdate><url>http://www.R-project.org</url><note>ISBN 3-900051-07-0</note></bibl><bibl id="B38"><title><p>Fast effective rule induction</p></title><aug><au><snm>Cohen</snm><fnm>WW</fnm></au></aug><source>Proceedings of the 12th International Conference on Machine Learning</source><editor>Prieditis A, Russell S</editor><pubdate>1995</pubdate><fpage>115</fpage><lpage>123</lpage></bibl><bibl id="B39"><title><p>Generating accurate rule sets without global optimization</p></title><aug><au><snm>Frank</snm><fnm>E</fnm></au><au><snm>Witten</snm><fnm>IH</fnm></au></aug><source>Machine Learning: Proceedings of the Fifteenth International Conference</source><editor>Shavlik J</editor><pubdate>1998</pubdate></bibl><bibl id="B40"><title><p>Leave-One-Out Cross-Validation Based Model Selection Criteria for Weighted LS-SVMs</p></title><aug><au><snm>Cawley</snm><fnm>GC</fnm></au></aug><source>Proceedings of the IEEE World Congress on Computational Intelligence</source><pubdate>2006</pubdate></bibl><bibl id="B41"><title><p>An introduction to ROC analysis</p></title><aug><au><snm>Fawcett</snm><fnm>T</fnm></au></aug><source>Pattern Recognition Letters</source><pubdate>2006</pubdate><volume>27</volume><fpage>861</fpage><lpage>874</lpage><xrefbib><pubid idtype="doi">10.1016/j.patrec.2005.10.010</pubid></xrefbib></bibl><bibl id="B42"><title><p>ROCR: visualizing classifier performance in R</p></title><aug><au><snm>Sing</snm><fnm>T</fnm></au><au><snm>Sander</snm><fnm>O</fnm></au><au><snm>Beerenwinkel</snm><fnm>N</fnm></au><au><snm>Lengauer</snm><fnm>T</fnm></au></aug><source>Bioinformatics</source><pubdate>2005</pubdate><volume>21</volume><issue>20</issue><fpage>3940</fpage><lpage>3941</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/bti623</pubid><pubid idtype="pmpid" link="fulltext">16096348</pubid></pubidlist></xrefbib></bibl><bibl id="B43"><title><p>The Jpred 3 secondary structure prediction server</p></title><aug><au><snm>Cole</snm><fnm>C</fnm></au><au><snm>Barber</snm><fnm>JD</fnm></au><au><snm>Barton</snm><fnm>GJ</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2008</pubdate><volume>36</volume><fpage>W197</fpage><lpage>201</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkn238</pubid><pubid idtype="pmcid">2447793</pubid><pubid idtype="pmpid" link="fulltext">18463136</pubid></pubidlist></xrefbib></bibl><bibl id="B44"><title><p>Predicting human immunodeficiency virus protease cleavage sites in proteins by a discriminant function method</p></title><aug><au><snm>Chou</snm><fnm>KC</fnm></au><au><snm>Tomasselli</snm><fnm>AG</fnm></au><au><snm>Reardon</snm><fnm>IM</fnm></au><au><snm>Heinrikson</snm><fnm>RL</fnm></au></aug><source>Proteins</source><pubdate>1996</pubdate><volume>24</volume><fpage>51</fpage><lpage>72</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1002/(SICI)1097-0134(199601)24:1&lt;51::AID-PROT4&gt;3.0.CO;2-R</pubid><pubid idtype="pmpid">8628733</pubid></pubidlist></xrefbib></bibl><bibl id="B45"><title><p>Individual comparisons by ranking methods</p></title><aug><au><snm>Wilcoxon</snm><fnm>F</fnm></au></aug><source>Biometrics</source><pubdate>1945</pubdate><volume>1</volume><fpage>80</fpage><lpage>83</lpage><xrefbib><pubid idtype="doi">10.2307/3001968</pubid></xrefbib></bibl><bibl id="B46"><title><p>Statistical comparisons of classifiers over multiple data sets</p></title><aug><au><snm>Demsar</snm><fnm>J</fnm></au></aug><source>Journal of Machine Learning Research</source><pubdate>2006</pubdate><volume>7</volume><fpage>1</fpage><lpage>30</lpage></bibl><bibl id="B47"><title><p>Human immunodeficiency virus type 1 resistance to the small molecule maturation inhibitor 3-O-(3',3'-dimethylsuccinyl)-betulinic acid is conferred by a variety of single amino acid substitutions at the CA-SP1 cleavage site in Gag</p></title><aug><au><snm>Zhou</snm><fnm>J</fnm></au><au><snm>Chen</snm><fnm>CH</fnm></au><au><snm>Aiken</snm><fnm>C</fnm></au></aug><source>J Virol</source><pubdate>2006</pubdate><volume>80</volume><issue>24</issue><fpage>12095</fpage><lpage>101</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/JVI.01626-06</pubid><pubid idtype="pmcid">1676313</pubid><pubid idtype="pmpid" link="fulltext">17035324</pubid></pubidlist></xrefbib></bibl><bibl id="B48"><title><p>Profile hidden Markov models</p></title><aug><au><snm>Eddy</snm><fnm>SR</fnm></au></aug><source>Bioinformatics</source><pubdate>1998</pubdate><volume>14</volume><issue>9</issue><fpage>755</fpage><lpage>63</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/14.9.755</pubid><pubid idtype="pmpid" link="fulltext">9918945</pubid></pubidlist></xrefbib></bibl><bibl id="B49"><title><p>PA-457: a potent HIV inhibitor that disrupts core condensation by targeting a late step in Gag processing</p></title><aug><au><snm>Li</snm><fnm>F</fnm></au><au><snm>Goila-Gaur</snm><fnm>R</fnm></au><au><snm>Salzwedel</snm><fnm>K</fnm></au><au><snm>Kilgore</snm><fnm>NR</fnm></au><au><snm>Reddick</snm><fnm>M</fnm></au><au><snm>Matallana</snm><fnm>C</fnm></au><au><snm>Castillo</snm><fnm>A</fnm></au><au><snm>Zoumplis</snm><fnm>D</fnm></au><au><snm>Martin</snm><fnm>DE</fnm></au><au><snm>Orenstein</snm><fnm>JM</fnm></au><au><snm>Allaway</snm><fnm>GP</fnm></au><au><snm>Freed</snm><fnm>EO</fnm></au><au><snm>Wild</snm><fnm>CT</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2003</pubdate><volume>100</volume><issue>23</issue><fpage>13555</fpage><lpage>60</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.2234683100</pubid><pubid idtype="pmcid">263852</pubid><pubid idtype="pmpid" link="fulltext">14573704</pubid></pubidlist></xrefbib></bibl><bibl id="B50"><title><p>A putative alpha-helical structure which overlaps the capsid-p2 boundary in the human immunodeficiency virus type 1 Gag precursor is crucial for viral particle assembly</p></title><aug><au><snm>Accola</snm><fnm>MA</fnm></au><au><snm>H&#246;glund</snm><fnm>S</fnm></au><au><snm>G&#246;ttlinger</snm><fnm>HG</fnm></au></aug><source>J Virol</source><pubdate>1998</pubdate><volume>72</volume><fpage>2072</fpage><lpage>2078</lpage><xrefbib><pubidlist><pubid idtype="pmcid">109501</pubid><pubid idtype="pmpid" link="fulltext">9499062</pubid></pubidlist></xrefbib></bibl><bibl id="B51"><title><p>Helical structure determined by NMR of the HIV-1 (345-392)Gag sequence, surrounding p2: Implications for particle assembly and RNA packaging</p></title><aug><au><snm>Morellet</snm><fnm>N</fnm></au><au><snm>Druillennec</snm><fnm>S</fnm></au><au><snm>Lenoir</snm><fnm>C</fnm></au><au><snm>Bouaziz</snm><fnm>S</fnm></au><au><snm>Roques</snm><fnm>B</fnm></au></aug><source>Protein Science</source><pubdate>2004</pubdate><volume>14</volume><fpage>375</fpage><lpage>386</lpage><xrefbib><pubid idtype="doi">10.1110/ps.041087605</pubid></xrefbib></bibl><bibl id="B52"><title><p>Structures of the HIV-1 capsid protein dimerization domain at 2.6 A resolution</p></title><aug><au><snm>Worthylake</snm><fnm>DK</fnm></au><au><snm>Wang</snm><fnm>H</fnm></au><au><snm>Yoo</snm><fnm>S</fnm></au><au><snm>Sundquist</snm><fnm>WI</fnm></au><au><snm>Hill</snm><fnm>CP</fnm></au></aug><source>Acta Crystallogr D Biol Crystallogr</source><pubdate>1999</pubdate><volume>55</volume><fpage>85</fpage><lpage>92</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1107/S0907444998007689</pubid><pubid idtype="pmpid" link="fulltext">10089398</pubid></pubidlist></xrefbib></bibl><bibl id="B53"><title><p>Structure of complex of synthetic HIV-1 protease with a substrate-based inhibitor at 2.3 A resolution</p></title><aug><au><snm>Miller</snm><fnm>M</fnm></au><au><snm>Schneider</snm><fnm>J</fnm></au><au><snm>Sathyanarayana</snm><fnm>BK</fnm></au><au><snm>Toth</snm><fnm>MV</fnm></au><au><snm>Marshall</snm><fnm>GR</fnm></au><au><snm>Clawson</snm><fnm>L</fnm></au><au><snm>Selk</snm><fnm>L</fnm></au><au><snm>Kent</snm><fnm>SB</fnm></au><au><snm>Wlodawer</snm><fnm>A</fnm></au></aug><source>Science</source><pubdate>1989</pubdate><volume>246</volume><issue>4934</issue><fpage>1149</fpage><lpage>52</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.2686029</pubid><pubid idtype="pmpid" link="fulltext">2686029</pubid></pubidlist></xrefbib></bibl><bibl id="B54"><title><p>Electron cryotomography of immature HIV-1 virions reveals the structure of the CA and SP1 Gag shells</p></title><aug><au><snm>Wright</snm><fnm>ER</fnm></au><au><snm>Schooler</snm><fnm>JB</fnm></au><au><snm>Ding</snm><fnm>HJ</fnm></au><au><snm>Kieffer</snm><fnm>C</fnm></au><au><snm>Fillmore</snm><fnm>C</fnm></au><au><snm>Sundquist</snm><fnm>WI</fnm></au><au><snm>Jensen</snm><fnm>GJ</fnm></au></aug><source>EMBO J</source><pubdate>2007</pubdate><volume>26</volume><issue>8</issue><fpage>2218</fpage><lpage>26</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/sj.emboj.7601664</pubid><pubid idtype="pmcid">1852790</pubid><pubid idtype="pmpid" link="fulltext">17396149</pubid></pubidlist></xrefbib></bibl><bibl id="B55"><title><p>High prevalence of bevirimat resistance mutations in protease inhibitor-resistant HIV isolates</p></title><aug><au><snm>Verheyen</snm><fnm>J</fnm></au><au><snm>Verhofstede</snm><fnm>C</fnm></au><au><snm>Knops</snm><fnm>E</fnm></au><au><snm>Vandekerckhove</snm><fnm>L</fnm></au><au><snm>Fun</snm><fnm>A</fnm></au><au><snm>Brunen</snm><fnm>D</fnm></au><au><snm>Dauwe</snm><fnm>K</fnm></au><au><snm>Wensing</snm><fnm>A</fnm></au><au><snm>Pfister</snm><fnm>H</fnm></au><au><snm>Kaiser</snm><fnm>R</fnm></au><au><snm>Nijhuis</snm><fnm>M</fnm></au></aug><source>AIDS</source><pubdate>2009</pubdate><inpress/><xrefbib><pubid idtype="pmpid" link="fulltext">19926962</pubid></xrefbib></bibl></refgrp>
   </bm>
</art>
