<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1752-153X-2-S1-P9</ui>
   <ji>1752-153X</ji>
   <fm>
      <dochead>Poster presentation</dochead>
      <bibl>
         <title>
            <p>On some aspects of validation of predictive QSAR models</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Roy</snm>
               <fnm>K</fnm>
               <insr iid="I1"/>
               <email>kunalroy_in@yahoo.com</email>
            </au>
            <au id="A2">
               <snm>Roy</snm>
               <fnm>PP</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A3">
               <snm>Leonard</snm>
               <fnm>JT</fnm>
               <insr iid="I1"/>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Drug Theoretics and Cheminformatics Lab, Division of Medicinal and Pharmaceutical Chemistry, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700 032, India</p>
            </ins>
         </insg>
         <source>Chemistry Central Journal</source>
         <supplement>
            <title>
               <p>3rd German Conference on Chemoinformatics: 21. CIC-Workshop</p>
            </title>
            <note>Meeting abstracts - A single PDF containing all abstracts in this Supplement is available <a href="http://www.biomedcentral.com/content/files/pdf/1752-153X-2-S1-full.pdf">here</a>.</note>
            <url>http://www.biomedcentral.com/content/pdf/1752-153X-2-S1-info.pdf</url>
         </supplement>
         <conference>
            <title>
               <p>3rd German Conference on Chemoinformatics</p>
            </title>
            <location>Goslar, Germany</location>
            <date-range>11-13 November 2007</date-range>
            <url>http://www.gdch.de/gcc2007</url>
         </conference>
         <issn>1752-153X</issn>
         <pubdate>2008</pubdate>
         <volume>2</volume>
         <issue>Suppl 1</issue>
         <fpage>P9</fpage>
         <url>http://www.journal.chemistrycentral.com/content/2/S1/P9</url>
         <xrefbib>
            <pubid idtype="doi">10.1186/1752-153X-2-S1-P9</pubid>
         </xrefbib>
      </bibl>
      <history>
         <pub>
            <date>
               <day>26</day>
               <month>03</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Roy et al.</collab>
      </cpyrt>
   </fm>
   <bdy>
      <sec>
         <st>
            <p/>
         </st>
         <p>Quantitative structure-activity relationships (QSARs) represent predictive models derived from application of statistical tools correlating biological activity (including therapeutic and toxic) of chemicals (drugs/toxicants/environmental pollutants) with descriptors representative of molecular structure and/or property. The success of any QSAR model depends on accuracy of the input data, selection of appropriate descriptors and statistical tools, and most importantly validation of the developed model. Validation is the process by which the reliability and relevance of a procedure are established for a specific purpose. Leave one-out cross-validation generally leads to an overestimation of predictive capacity, and even with external validation, no one can be sure whether the selection of training and test sets was manipulated to maximize the predictive capacity of the model being published. In this paper, we present some representative examples of validation of QSAR models in order to explore possible importance of the method of selection of training set compounds, setting training set size and impact of variable selection for training set models for determining the quality of prediction. The major conclusions from the study are: (1) <it>K</it>-means cluster based division of training and prediction sets can be used as a reliable method of division of data set into training and test sets for developing predictive QSAR models; (2) the training set size should be set at an optimal level so that the model is developed with proper training (learning) process and the developed model is able to satisfactorily predict the activity values of the test set compounds; (3) choice of variables for regression based only on Q<sup>2</sup> value may not be optimum. Furthermore, predictive R<sup>2</sup> value may not be considered as the only criterion to indicate external predictability of a model.</p>
      </sec>
   </bdy>
</art>
