<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-9-S6-S3</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Evolutionary conservation of DNA-contact residues in DNA-binding domains</p>
         </title>
         <aug>
            <au id="A1" ce="yes">
               <snm>Chang</snm>
               <fnm>Yao-Lin</fnm>
               <insr iid="I1"/>
               <email>b4506046@csie.ntu.edu.tw</email>
            </au>
            <au id="A2" ce="yes">
               <snm>Tsai</snm>
               <fnm>Huai-Kuang</fnm>
               <insr iid="I2"/>
               <email>hktsai@iis.sinica.edu.tw</email>
            </au>
            <au id="A3" ca="yes">
               <snm>Kao</snm>
               <fnm>Cheng-Yan</fnm>
               <insr iid="I1"/>
               <insr iid="I3"/>
               <email>cykao@csie.ntu.edu.tw</email>
            </au>
            <au id="A4">
               <snm>Chen</snm>
               <fnm>Yung-Chian</fnm>
               <insr iid="I4"/>
               <email>smalljohn@hotmail.com</email>
            </au>
            <au id="A5">
               <snm>Hu</snm>
               <fnm>Yuh-Jyh</fnm>
               <insr iid="I5"/>
               <email>yhu@cis.nctu.edu.tw</email>
            </au>
            <au id="A6" ca="yes">
               <snm>Yang</snm>
               <fnm>Jinn-Moon</fnm>
               <insr iid="I4"/>
               <insr iid="I6"/>
               <insr iid="I7"/>
               <email>moon@faculty.nctu.edu.tw</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Computer Science and Information Engineering, National Taiwan University, Taipei 106, Taiwan</p>
            </ins>
            <ins id="I2">
               <p>Institute of Information Science, Academia Sinica, Taipei 115, Taiwan</p>
            </ins>
            <ins id="I3">
               <p>Institute for Information Industry, Taipei 106, Taiwan</p>
            </ins>
            <ins id="I4">
               <p>Institute of Bioinformatics, National Chiao Tung University, Hsinchu 30050, Taiwan</p>
            </ins>
            <ins id="I5">
               <p>Department of Computer Science, National Chiao Tung University, Hsinchu 30050, Taiwan</p>
            </ins>
            <ins id="I6">
               <p>Department of Biological Science and Technology, National Chiao Tung University, Hsinchu 30050, Taiwan</p>
            </ins>
            <ins id="I7">
               <p>Core Facility for Structural Bioinformatics, National Chiao Tung University, Hsinchu, Taiwan</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <supplement>
            <title>
               <p>Symposium of Computations in Bioinformatics and Bioscience (SCBB07)</p>
            </title>
            <editor>Guoqing Lu, Jun Ni, Thomas L Casavant and Brian Athey</editor>
            <note>Research</note>
         </supplement>
         <conference>
            <title>
               <p>Symposium of Computations in Bioinformatics and Bioscience (SCBB07)</p>
            </title>
            <location>Iowa City, Iowa, USA</location>
            <date-range>13&#8211;15 August 2007</date-range>
            <url>http://www.imsccs-conference.org/imsccs07/SCBB07/index.html</url>
         </conference>
         <issn>1471-2105</issn>
         <pubdate>2008</pubdate>
         <volume>9</volume>
         <issue>Suppl 6</issue>
         <fpage>S3</fpage>
         <url>http://www.biomedcentral.com/1471-2105/9/S6/S3</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18541056</pubid>
               <pubid idtype="doi">10.1186/1471-2105-9-S6-S3</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <pub>
            <date>
               <day>28</day>
               <month>5</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Chang et al; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>DNA-binding proteins are of utmost importance to gene regulation. The identification of DNA-binding domains is useful for understanding the regulation mechanisms of DNA-binding proteins. In this study, we proposed a method to determine whether a domain or a protein can has DNA binding capability by considering evolutionary conservation of DNA-binding residues.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Our method achieves high precision and recall for 66 families of DNA-binding domains, with a false positive rate less than 5% for 250 non-DNA-binding proteins. In addition, experimental results show that our method is able to identify the different DNA-binding behaviors of proteins in the same SCOP family based on the use of evolutionary conservation of DNA-contact residues.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>This study shows the conservation of DNA-contact residues in DNA-binding domains. We conclude that the members in the same subfamily bind DNA specifically and the members in different subfamilies often recognize different DNA targets. Additionally, we observe the co-evolution of DNA-contact residues and interacting DNA base-pairs.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>DNA-binding proteins play a key role in living organisms of many genetic activities such as transcription, recombination, DNA replication and repair. One or more domains of these proteins interact with DNA, and they offer the specificity for direct and indirect readout of DNA <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. To identify the DNA-binding domains is very important for understanding the regulation mechanisms.</p>
         <p>Recently, rapidly increasing amount of protein-DNA complexes from X-ray crystallography and nuclear magnetic resonance (NMR) have enabled the use of structural-based approaches for identifying DNA-binding proteins. Most of the structural DNA-binding domains can be categorized into several classes according to their structures or binding types <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. However, some DNA-binding domains can not be well categorized, and for some DNA-binding domains structural information is unavailable <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B5">5</abbr></abbrgrp>. Several studies used various computational approaches to predict potential DNA-binding proteins by using protein-DNA complexes structure features, such as the overall charges, electric moments, and shape of binding sites <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp>. Since the charge and conformational complementarities of binding sites are essential for protein-DNA binding, these features provide a reasonable basis to identify DNA-binding proteins. Another trend is to consider the degree of conservation of residues <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>. Luscombe and Thornton <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> have studied 21 families of DNA-binding proteins and showed that those amino acids interacting with the DNA are better conserved than those not interacting with DNA. Stawiski et al. <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> found that electrostatic patches of DNA-binding proteins have a higher percentage of aromatic and positive residues. According to the general properties of 20 amino acids, they also showed that residues of the patch are conserved at property levels.</p>
         <p>In this paper, we propose a structure-based threading method by considering evolutionary conservation of DNA-contact residues in DNA-binding domains to identify DNA-binding domains. We use BLOSUM62 <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, an evolutionary-based scoring matrix for amino acid substitutions, to measure the degree of conservation of binding residues. Our method can achieve high precision and recall for 66 families of DNA-binding domains, with a false positive rate less than 5% for 250 non-DNA-binding proteins.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>Given a query domain, our method identified similar DNA-binding structures or homologous protein sequences from the template library. To evaluate the performance of our method, for each DNA-contact domain (<it>D</it>) in the template library we generated its corresponding positive and negative sets. The members in the positive set contain the domains similar to domain <it>D </it>based on SCOP, while domains in the negative set do not. By applying our method on these two sets, we found that the scores of the domains in the positive set are significantly higher than those of domains in the negative set. We further determined a threshold to achieve high precision and recall. Combining with the threshold, we applied our method on 66 known SCOP families of DNA-binding domains and 250 non-DNA-binding proteins to examine the performance.</p>
         <sec>
            <st>
               <p>Positive and negative set for each contact domain</p>
            </st>
            <p>We collected DNA-binding contact domains from SCOP database, the detail is described in Method. To remove redundant contact domains, domains with highly similar sequences (identity > 90%) are grouped using the NCBI software BLASTCLUST. In each group, the one with the maximal number of contact residues is chosen as the representative domain of a group. For a representative domain <it>R</it>, these protein domains in the same SCOP family are considered as the member of <it>R </it>according to SCOP95 (members whose similarity greater than 95% are excluded). Each member of <it>R </it>was aligned to <it>R </it>using the CE. We define a residue of <it>R </it>as misaligned if it is aligned to a gap. A family member is discarded if more than 20% contact residues of <it>R </it>are misaligned between <it>R </it>and this member. Family members that satisfy the above criteria are considered to be in the positive set. If there are less than five members in the positive set of <it>R</it>, the entire family of <it>R </it>is discarded. We finally yielded 66 representative domains with corresponding positive sets. For each <it>R</it>, we artificially generated 1000 domains to be the negative set. To do this, for each artificial domain, we replicate its residues from <it>R</it>. Then we randomly mutated the residue type of each contact residue of <it>R</it>.</p>
         </sec>
         <sec>
            <st>
               <p>Determining the threshold of similar DNA-binding function of a contact domain</p>
            </st>
            <p>For each representative domain <it>R</it>, each member in the positive and negative sets was scored by the method we developed. Ideally, the scores of domains in the positive set should be on average significantly higher than those of the negative set. We used the Kolmogorov-Smirnov (KS) test to examine the above criterion. The KS test is a nonparametric test to determine if two distributions differ significantly. According to our results, the scores are significantly different for the positive set and the negative set in most domains (97% of 66 sets have a <it>p </it>value less than 0.05).</p>
            <p>Further, given a contact domain, we would like to determine a threshold for determining which domains have a similar DNA-binding function. For the two sets (positive and negative) of a representative domain, we separately transform all members' scores to z-scores by</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-9-S6-S3-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>z</m:mi>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:mi>s</m:mi>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mi>&#956;</m:mi>
                              </m:mrow>
                              <m:mi>&#948;</m:mi>
                           </m:mfrac>
                           <m:mo>,</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemOEaONaeyypa0tcfa4aaSaaaeaacqWGZbWCcqGHsislcqaH8oqBaeaacqaH0oazaaGccqGGSaalaaa@35E5@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <it>s </it>is the score of a member, <it>&#956; </it>is the mean score of the these two sets, and <it>&#948; </it>is the standard deviation. Figures <figr fid="F1">1A</figr> and <figr fid="F1">1B</figr> show the precision (ratio of the number of retrieved true positive data to all retrieved data) and the recall (ratio of the number of retrieved true positive data to all true positive) with various z-score thresholds, respectively. As shown in Figure <figr fid="F1">1A</figr>, when we set the threshold greater than two, the precisions of using different thresholds are very similar (>90%). If we set the z-score threshold to one, only 60% of families are with high precision. The results imply that larger thresholds will yield higher precisions, but the benefit is limited when the threshold is larger than two. Oppositely, as shown in Figure <figr fid="F1">1B</figr>, larger thresholds will reduce the recall. According to these results, we take the z-score threshold as 2.0 and the domains with a z-score higher than the threshold will be considered as putative DNA-binding domains.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Precision and recall on different z-score thresholds</p>
               </caption>
               <text>
                  <p><b>Precision and recall on different z-score thresholds</b>. Our method results on different z-score thresholds for 66 representative domains. The distributions of the numbers of the families for (A) precisions and (B) recalls.</p>
               </text>
               <graphic file="1471-2105-9-S6-S3-1"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Non-DNA-binding proteins</p>
            </st>
            <p>We further apply our method to 250 non-nucleic-acid binding (non-DNA-binding) proteins, which were initially studied by Hobohm and Sander <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> and further specified by Stawiski <it>et al. </it><abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. We align all non-redundant contact domains to those non-DNA-binding proteins using CE. Alignments whose z-scores (defined by CE) are greater than 3.7 with the misalign rate of contact residues less than 20% are chosen as non-DNA-binding domains. 177 non-DNA-binding domains pass the constraints among 250 proteins. We applied our method on these non-DNA-binding domains and transformed their scores to <it>z</it>-scores. Figure <figr fid="F2">2</figr> shows the distribution of z-scores of non-DNA-binding domains. The scores approximately follow a normal distribution and the peak of the density occurred at <it>Z </it>= -1~0. Given a z-score threshold, the false positive rate is the ratio of number of domains whose z-score are beyond the threshold to the total non-DNA-binding domains. According to our previous analysis, we set the threshold to 2.0 and the false positive rate is less than 0.05. It shows that for non-DNA-binding domains, our method can recognize their non-binding with high accuracy.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Distribution of z-score values of 177 non-DNA-binding domains</p>
               </caption>
               <text>
                  <p>Distribution of z-score values of 177 non-DNA-binding domains.</p>
               </text>
               <graphic file="1471-2105-9-S6-S3-2"/>
            </fig>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>Figure <figr fid="F3">3A</figr> shows an example, which is the ultrabithorax homeodomain (Ubx) from <it>Drosophila melanogaster </it>(PDB entry <ext-link ext-link-type="pdb" ext-link-id="1B8I-A">1B8I-A</ext-link><abbrgrp><abbr bid="B20">20</abbr></abbrgrp>) selected from 66 representative domains to described the characteristics of our method. The DNA is represented in green. 18 DNA-contact residues are presented as yellow stick and other residues are denoted as blue. The protein sequence is also presented and a contact residue is marked with an asterisk. For the alignment of the representative domain (<ext-link ext-link-type="pdb" ext-link-id="1B8I-A">1B8I-A</ext-link>) to the domains of its member, Figure <figr fid="F3">3B</figr> presents a nice case (PDB entry <ext-link ext-link-type="pdb" ext-link-id="1PUF-A">1PUF-A</ext-link>), which is a homeobox protein hox-a9 from mouse <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. We found that the contact residues is highly conserved in the aligned amino acids of the two domains and our scoring method shows this high z-score (z-score = 11.92). On the other hand, if we align <ext-link ext-link-type="pdb" ext-link-id="1B8I-A">1B8I-A</ext-link> to 250 non-DNA-binding proteins, our method is able to discard the similar protein structures whose contact residues are not conserved (z-score = 0.58). Figure <figr fid="F3">3C</figr> shows an example of aligning <ext-link ext-link-type="pdb" ext-link-id="1B8I-A">1B8I-A</ext-link> to <ext-link ext-link-type="pdb" ext-link-id="1BOB">1BOB</ext-link>, which is histone acetyltransferase hat1 from <it>S. cerevisiae </it>in complex with acetyl coenzyme <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>.</p>
         <fig id="F3">
            <title>
               <p>Figure 3</p>
            </title>
            <caption>
               <p>Searching results of the ultrabithorax homeodomain protein</p>
            </caption>
            <text>
               <p><b>Searching results of the ultrabithorax homeodomain protein</b>. Searching results using the homeotic Ubx/Exd/DNA ternary complex (PDB entry <ext-link ext-link-type="pdb" ext-link-id="1B8I-A">1B8I-A</ext-link>) from <it>Drosophila melanogaster </it>as the query. <b>(A) </b>The contact residues of <ext-link ext-link-type="pdb" ext-link-id="1B8I-A">1B8I-A</ext-link> complex are presented as stick (yellow). The sequence of <ext-link ext-link-type="pdb" ext-link-id="1B8I-A">1B8I-A</ext-link> is shown and contact residues are marked with asterisks. <b>(B) </b>Structure alignment of <ext-link ext-link-type="pdb" ext-link-id="1B8I-A">1B8I-A</ext-link> (blue) and <ext-link ext-link-type="pdb" ext-link-id="1PUF-A">1PUF-A</ext-link> (green). The score is 4.78 and Z-score is 11.92 by our scoring method. <b>(C) </b>Structure alignment of <ext-link ext-link-type="pdb" ext-link-id="1B8I-A">1B8I-A</ext-link> (blue) and non-DNA-binding protein <ext-link ext-link-type="pdb" ext-link-id="1BOB">1BOB</ext-link> (green). Only the aligned structure/sequence of <ext-link ext-link-type="pdb" ext-link-id="1B8I-A">1B8I-A</ext-link> and <ext-link ext-link-type="pdb" ext-link-id="1BOB">1BOB</ext-link> are shown. We obtained score = -0.72 and <it>Z</it>-score = 0.58.</p>
            </text>
            <graphic file="1471-2105-9-S6-S3-3"/>
         </fig>
         <p>The z-score of DNA-binding domains in the same SCOP family may be variable for several representative domains (Figure <figr fid="F4">4A</figr>). The <ext-link ext-link-type="pdb" ext-link-id="1PUF-A">1PUF-A</ext-link> and <ext-link ext-link-type="pdb" ext-link-id="1O4X-A1">1O4X-A1</ext-link> (Oct-1 POU homeodomains from <it>Homo sapiens </it><abbrgrp><abbr bid="B23">23</abbr></abbrgrp>) are the members of the <ext-link ext-link-type="pdb" ext-link-id="1B8I-A">1B8I-A</ext-link> representative domain. The <it>z</it>-scores are 11.92 (<ext-link ext-link-type="pdb" ext-link-id="1PUF-A">1PUF-A</ext-link>) and 4.4 (<ext-link ext-link-type="pdb" ext-link-id="1O4X-A1">1O4X-A1</ext-link>) when <ext-link ext-link-type="pdb" ext-link-id="1B8I-A">1B8I-A</ext-link> was used as the query (Figure <figr fid="F4">4A</figr>). The z-scores indicated that the contact residues between <ext-link ext-link-type="pdb" ext-link-id="1PUF-A">1PUF-A</ext-link> and <ext-link ext-link-type="pdb" ext-link-id="1B8I-A">1B8I-A</ext-link> are more conserved than the ones between <ext-link ext-link-type="pdb" ext-link-id="1O4X-A1">1O4X-A1</ext-link> and <ext-link ext-link-type="pdb" ext-link-id="1B8I-A">1B8I-A</ext-link> on contact residues interacting to the bases of the core binding site in the DNA.</p>
         <fig id="F4">
            <title>
               <p>Figure 4</p>
            </title>
            <caption>
               <p>Comparison of bound DNA sequences of homologous proteins</p>
            </caption>
            <text>
               <p><b>Comparison of bound DNA sequences of homologous proteins</b>. The alignments of the bound DNA sequences of homologous proteins by using the homeotic ubx/exd/DNA ternary complex (PDB entry <ext-link ext-link-type="pdb" ext-link-id="1B8I-A">1B8I-A</ext-link>) as the query. <b>(A) </b>The z-score values and the bound DNA sequences of the complex <ext-link ext-link-type="pdb" ext-link-id="1B8I">1B8I</ext-link> (PDB entry <ext-link ext-link-type="pdb" ext-link-id="1B8I-C">1B8I-C</ext-link> and <ext-link ext-link-type="pdb" ext-link-id="1B8I-D">1B8I-D</ext-link>), 1PUF (PDB entry <ext-link ext-link-type="pdb" ext-link-id="1PUF-D">1PUF-D</ext-link> and <ext-link ext-link-type="pdb" ext-link-id="1PUF-E">1PUF-E</ext-link>), and <ext-link ext-link-type="pdb" ext-link-id="1O4X">1O4X</ext-link> (PDB entry <ext-link ext-link-type="pdb" ext-link-id="1O4X-C">1O4X-C</ext-link> and <ext-link ext-link-type="pdb" ext-link-id="1O4X-D">1O4X-D</ext-link>). All sequences are from 5' to 3'. <b>(B) </b>Alignments of bound DNA sequences of the complexes <ext-link ext-link-type="pdb" ext-link-id="1B8I">1B8I</ext-link> and <ext-link ext-link-type="pdb" ext-link-id="1PUF">1PUF</ext-link>. A colon denotes an identical pair and an asterisk denotes a contact nucleotide (asterisks are marked above/below alphabets on the upper/lower sequence of the alignment, respectively). <b>(C) </b>Alignments of bound DNA sequences of the complexes <ext-link ext-link-type="pdb" ext-link-id="1B8I">1B8I</ext-link> and <ext-link ext-link-type="pdb" ext-link-id="1O4X">1O4X</ext-link>.</p>
            </text>
            <graphic file="1471-2105-9-S6-S3-4"/>
         </fig>
         <p>To investigate variation of contact residues of DNA-binding domain in the same SCOP family, we compared the bound DNA sequences of two DNA-binding domains by aligning the double-strand sequences to each other. <ext-link ext-link-type="pdb" ext-link-id="1B8I-A">1B8I-A</ext-link> binds two DNA sequences (i.e. PDB entry <ext-link ext-link-type="pdb" ext-link-id="1B8I-C">1B8I-C</ext-link> and <ext-link ext-link-type="pdb" ext-link-id="1B8I-D">1B8I-D</ext-link>) and <ext-link ext-link-type="pdb" ext-link-id="1O4X-A1">1O4X-A1</ext-link> binds another two DNA sequences (PDB entry <ext-link ext-link-type="pdb" ext-link-id="1O4X-C">1O4X-C</ext-link> and <ext-link ext-link-type="pdb" ext-link-id="1O4X-D">1O4X-D</ext-link>). First we generated four pairing alignments: <ext-link ext-link-type="pdb" ext-link-id="1B8I-C">1B8I-C</ext-link> and <ext-link ext-link-type="pdb" ext-link-id="1O4X-C">1O4X-C</ext-link>; <ext-link ext-link-type="pdb" ext-link-id="1B8I-C">1B8I-C</ext-link> and <ext-link ext-link-type="pdb" ext-link-id="1O4X-D">1O4X-D</ext-link>; <ext-link ext-link-type="pdb" ext-link-id="1B8I-D">1B8I-D</ext-link> and <ext-link ext-link-type="pdb" ext-link-id="1O4X-C">1O4X-C</ext-link>; and <ext-link ext-link-type="pdb" ext-link-id="1B8I-D">1B8I-D</ext-link> and <ext-link ext-link-type="pdb" ext-link-id="1O4X-D">1O4X-D</ext-link>. We do not allow any gap insertion when aligning a-pairing DNA sequences. The alignments are obtained by sliding two sequences against each other until the best match is found. The alignment with the maximum number of identical aligned pairs is chosen, and as a result the alignment between <ext-link ext-link-type="pdb" ext-link-id="1B8I-C">1B8I-C</ext-link> and <ext-link ext-link-type="pdb" ext-link-id="1O4X-C">1O4X-C</ext-link> is the one chosen (Figure <figr fid="F4">4C</figr>). Then we adjust the alignment of the other DNA strand pairs (i.e. <ext-link ext-link-type="pdb" ext-link-id="1B8I-D">1B8I-D</ext-link> and <ext-link ext-link-type="pdb" ext-link-id="1O4X-D">1O4X-D</ext-link>) according to this best alignment (<ext-link ext-link-type="pdb" ext-link-id="1B8I-C">1B8I-C</ext-link> and <ext-link ext-link-type="pdb" ext-link-id="1O4X-C">1O4X-C</ext-link>).</p>
         <p>Figures <figr fid="F4">4B</figr> and <figr fid="F4">4C</figr> show that the number of identical nucleotides between <ext-link ext-link-type="pdb" ext-link-id="1B8I-C">1B8I-C</ext-link> and <ext-link ext-link-type="pdb" ext-link-id="1PUF-E">1PUF-E</ext-link> (10) as well as <ext-link ext-link-type="pdb" ext-link-id="1B8I-D">1B8I-D</ext-link> and <ext-link ext-link-type="pdb" ext-link-id="1PUF-D">1PUF-D</ext-link> (10) is much higher than those of <ext-link ext-link-type="pdb" ext-link-id="1B8I-C">1B8I-C</ext-link> and <ext-link ext-link-type="pdb" ext-link-id="1O4X-C">1O4X-C</ext-link> (6) as well as <ext-link ext-link-type="pdb" ext-link-id="1B8I-D">1B8I-D</ext-link> and <ext-link ext-link-type="pdb" ext-link-id="1O4X-D">1O4X-D</ext-link> (5) for whole DNA sequences. At the same time, 11 identical contact nucleotides are obtained from the alignments of <ext-link ext-link-type="pdb" ext-link-id="1B8I-C">1B8I-C</ext-link> and <ext-link ext-link-type="pdb" ext-link-id="1PUF-E">1PUF-E</ext-link> as well as <ext-link ext-link-type="pdb" ext-link-id="1B8I-D">1B8I-D</ext-link> and <ext-link ext-link-type="pdb" ext-link-id="1PUF-D">1PUF-D</ext-link>; but two identical contact nucleotides are yielded from the alignments of <ext-link ext-link-type="pdb" ext-link-id="1B8I-C">1B8I-C</ext-link> and <ext-link ext-link-type="pdb" ext-link-id="1O4X-C">1O4X-C</ext-link> as well as <ext-link ext-link-type="pdb" ext-link-id="1B8I-D">1B8I-D</ext-link> and <ext-link ext-link-type="pdb" ext-link-id="1O4X-D">1O4X-D</ext-link> (the contact nucleotides are the nucleotides that interact with contact residues of protein). With respect to <ext-link ext-link-type="pdb" ext-link-id="1B8I-A">1B8I-A</ext-link>, <ext-link ext-link-type="pdb" ext-link-id="1PUF-A">1PUF-A</ext-link> and <ext-link ext-link-type="pdb" ext-link-id="1O4X-A1">1O4X-A1</ext-link> are different not only in the DNA sequences they bind to but also in their DNA-binding sites. These results show that the members in the same SCOP family may have different DNA-binding models and that our method is able to detect the different Protein-DNA interactions based on the evolutionary conservation of DNA-contact residues.</p>
         <p>We produced multiple protein sequence alignments of 13 homeodomains (Figure <figr fid="F5">5</figr>) selected from SCOP 1.71 using a multiple structure alignment tool, MUSTANG <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. These domains were ranked by z-scores calculated by using our scoring method and the sequence of <ext-link ext-link-type="pdb" ext-link-id="1B8I-A">1B8I-A</ext-link> as the query. According to z-scores, these 13 domains can be roughly divided into two groups, including the Ubx-like homeodomain colored in blue (e.g. PDB entry <ext-link ext-link-type="pdb" ext-link-id="9ANT-A">9ANT-A</ext-link> (12.77), <ext-link ext-link-type="pdb" ext-link-id="1AHD-P">1AHD-P</ext-link> (12.19), and <ext-link ext-link-type="pdb" ext-link-id="1SAN">1SAN</ext-link> (11.96)) and the Oct-1 POU homeodomain colored in red (e.g. PDB entry <ext-link ext-link-type="pdb" ext-link-id="1E3O-C1">1E3O-C1</ext-link> (6.40), <ext-link ext-link-type="pdb" ext-link-id="1GT0-C1">1GT0-C1</ext-link> (6.38), and <ext-link ext-link-type="pdb" ext-link-id="1O4X-A1">1O4X-A1</ext-link> (4.40)). Figure <figr fid="F5">5</figr> shows that all Ubx-like homeodomains are significantly more conserved than Oct-1 POU homeodomains on contact residues (green). The Ubx homeodomain binds together with the extradenticle homeodomain (Exd) to recognize four DNA bases (ATAA) <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> based on four residues that are Ile47, Gln50, Asn51, and Met54, locating at <it>&#945;</it>3 helix in the Ubx (gray columns in Figure <figr fid="F5">5</figr>). The z-scores of the domains are higher if they are conserved on these four residues, such as three antennapedia homeodomains and two homeobox protein hox. These results show that contact residues interacting with bases in the DNA sequences are often conserved. This result is consistent to previous results <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> in which the homeodomain family was considered as a multi-specific family that consists of some subfamilies. This work concluded that members in the same subfamily bind DNA specifically but the members in different subfamilies recognize different DNA targets. In summary, we demonstrated the conservation of DNA-contact residues in DNA-binding domains.</p>
         <fig id="F5">
            <title>
               <p>Figure 5</p>
            </title>
            <caption>
               <p>Multiple structure alignment of 13 homeodomain structures</p>
            </caption>
            <text>
               <p><b>Multiple structure alignment of 13 homeodomain structures</b>. The domains with similar DNA-binding specificities with <ext-link ext-link-type="pdb" ext-link-id="1B8I-A">1B8I-A</ext-link> are shown in blue and others are red. The contact residues of <ext-link ext-link-type="pdb" ext-link-id="1B8I-A">1B8I-A</ext-link> are marked green. The contact residues interacting to the bases of the core binding site in the DNA (ATAA) major groove are indicated gray.</p>
            </text>
            <graphic file="1471-2105-9-S6-S3-5"/>
         </fig>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>The contact residues of DNA-binding domains are useful in discriminating DNA-binding domains from non-DNA-binding domains in a novel protein sequence. Our method, which considers evolutionary conservation of DNA-binding residues, can achieve high precision and recall for 66 families of DNA-binding domains, with a false positive rate less than 5% for 250 non-DNA-binding proteins. In addition, our method is able to identify the different DNA-binding behaviors of proteins in the same SCOP family based on the evolutionary conservation of DNA-contact residues. We also discussed the mutation of contact residues of DNA-binding domains can possibly change the bound DNA sequences. It implies that the co-change of DNA-contact residues and their DNA-binding bases.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <p>Figure <figr fid="F6">6</figr> shows the flowchart of our proposed method. We quantitatively evaluated whether a given protein domain <it>M </it>has a similar DNA-binding function to a known crystal protein-DNA structure. For each crystal structure of protein-DNA complex in Protein Data Bank (PDB), we first identified the DNA-contact domain (<it>D</it>) using geometry information and domain definitions from Structure Classification of Proteins (SCOP, version 1.71) <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. The structures and sequences of both protein-DNA complexes and their DNA-contact domains were collected in the template library. For a given protein sequence/structure <it>M</it>, we used sequence/structural alignment tools to find the homologous DNA-contact domain <it>D </it>from the template library. Finally, we proposed a score method to evaluate the similarity between <it>M </it>and <it>D </it>based on the BLOSUM matrix. Detailed descriptions are as follows.</p>
         <fig id="F6">
            <title>
               <p>Figure 6</p>
            </title>
            <caption>
               <p>Flowchart of proposed method</p>
            </caption>
            <text>
               <p><b>Flowchart of proposed method</b>. See text.</p>
            </text>
            <graphic file="1471-2105-9-S6-S3-6"/>
         </fig>
         <sec>
            <st>
               <p>Template library</p>
            </st>
            <p>We first collected protein-DNA complexes from PDB and each complex should contain at least one protein chain and a double-strand DNA. As in Luscombe et al. <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>, a complex was excluded if its DNA is single-stranded or the length of the DNA is less than 4 bases. For each protein-DNA complex, we then identify contact residues and contact domains of this protein. Contact residues, whose heavy atoms are within a distance (distance &#8804; 4.5 &#197;) of any heavy atoms of the bound DNA, are considered as the core parts of the contact domain in a complex <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. For each protein-DNA complex, we identified its DNA-contact domains according to contact residues and the definition of the SCOP database. Each domain must have more than 5 contact residues and the number of residues of this protein is more than 50 to make sure that the contact between the protein and DNA was reasonably extensive. Finally, 230 contact DNA-binding domains were identified and collected in the template library.</p>
         </sec>
         <sec>
            <st>
               <p>Homologous proteins searching</p>
            </st>
            <p>For a given protein sequence/structure <it>M</it>, we found a homologous DNA-binding protein from the template library using alignment tools. If <it>M </it>is a 3D-structure, we used a structure alignment (i.e. CE <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>) to align <it>M </it>to all contact domains. The CE will return a <it>Z </it>score for each alignment representing the structure similarity of the two aligned structures. DNA-binding proteins are considered as homologous proteins of query <it>M </it>if CE <it>Z </it>scores of exceed 3.7 based on CE's statistical model. On the other hand, if <it>M </it>is a protein sequence, we used sequence alignment (i.e. FASTA <abbrgrp><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr></abbrgrp>) to search the template library. Here, a DNA-binding protein is considered a homologous protein of <it>M </it>if the sequence identity exceeds 25% according to observations of previous studies <abbrgrp><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Scoring method</p>
            </st>
            <p>For an alignment of the query domain (<it>M</it>) and a contact domain (<it>D</it>) that satisfies the above criterion, we calculate the alignment score for the aligned contact residues by using the BLOSUM62 matrix. The scoring method is defined as:</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-9-S6-S3-i2" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>S</m:mi>
                              <m:mi>M</m:mi>
                           </m:msub>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:mstyle displaystyle="true">
                                    <m:munder>
                                       <m:mo>&#8721;</m:mo>
                                       <m:mrow>
                                          <m:mi>i</m:mi>
                                          <m:mo>&#8712;</m:mo>
                                          <m:mi>C</m:mi>
                                          <m:mi>R</m:mi>
                                       </m:mrow>
                                    </m:munder>
                                    <m:mrow>
                                       <m:mi>B</m:mi>
                                       <m:mi>L</m:mi>
                                       <m:mi>O</m:mi>
                                       <m:mi>S</m:mi>
                                       <m:mi>U</m:mi>
                                       <m:mi>M</m:mi>
                                       <m:mn>62</m:mn>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:msub>
                                          <m:mi>d</m:mi>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                       <m:mo>,</m:mo>
                                       <m:msub>
                                          <m:mi>m</m:mi>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                 </m:mstyle>
                              </m:mrow>
                              <m:mrow>
                                 <m:mtext>#contact&#160;residues</m:mtext>
                              </m:mrow>
                           </m:mfrac>
                           <m:mo>,</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaem4uam1aaSbaaSqaaiabd2eanbqabaGccqGH9aqpjuaGdaWcaaqaamaaqafabaGaemOqaiKaemitaWKaem4ta8Kaem4uamLaemyvauLaemyta0KaeGOnayJaeGOmaiJaeiikaGIaemizaq2aaSbaaeaacqWGPbqAaeqaaiabcYcaSiabd2gaTnaaBaaabaGaemyAaKgabeaacqGGPaqkaeaacqWGPbqAcqGHiiIZcqWGdbWqcqWGsbGuaeqacqGHris5aaqaaiabbocaJiabbogaJjabb+gaVjabb6gaUjabbsha0jabbggaHjabbogaJjabbsha0jabbccaGiabbkhaYjabbwgaLjabbohaZjabbMgaPjabbsgaKjabbwha1jabbwgaLjabbohaZbaacqGGSaalaaa@5F76@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <it>CR </it>is the set of the contact residues between <it>D </it>and <it>M</it>; <it>d</it><sub><it>i </it></sub>and <it>m</it><sub><it>i </it></sub>denote the corresponding <it>i</it><sup><it>th </it></sup>contact residue of <it>D </it>and <it>M</it>, respectively. Here, the score of a misaligned residue is -4 which is the smallest in the BLOSUM62 matrix.</p>
            <p>Authors' contributionsYLC and HKT carried out the design of scoring functions and data set preparation, participated in experimental designs and drafted the manuscript. CYK provided the design of this study. YCC and YJH provided the domain knowledge and useful comments. JMY provided the original idea, participated in the design and coordination of this study and helped to draft the manuscript. All authors read and approved the final manuscript.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The authors declare that they have no competing interests.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>J.-M. Yang was supported by National Science Council and partial support of the ATU plan by MOE. Authors are grateful to both the hardware and software supports of the Structural Bioinformatics Core Facility at National Chiao Tung University.</p>
            <p>This article has been published as part of <it>BMC Bioinformatics </it>Volume 9 Supplement 6, 2008: Symposium of Computations in Bioinformatics and Bioscience (SCBB07). The full contents of the supplement are available online at <url>http://www.biomedcentral.com/1471-2105/9?issue=S6</url>.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Intermolecular and intramolecular readout mechanisms in protein-DNA recognition</p>
            </title>
            <aug>
               <au>
                  <snm>Michael Gromiha</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Siebers</snm>
                  <fnm>JG</fnm>
               </au>
               <au>
                  <snm>Selvaraj</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kono</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Sarai</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2004</pubdate>
            <volume>337</volume>
            <issue>2</issue>
            <fpage>285</fpage>
            <lpage>294</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2004.01.033</pubid>
                  <pubid idtype="pmpid" link="fulltext">15003447</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Scissors-grip model for DNA recognition by a family of leucine zipper proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Vinson</snm>
                  <fnm>CR</fnm>
               </au>
               <au>
                  <snm>Sigler</snm>
                  <fnm>PB</fnm>
               </au>
               <au>
                  <snm>McKnight</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1989</pubdate>
            <volume>246</volume>
            <issue>4932</issue>
            <fpage>911</fpage>
            <lpage>916</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.2683088</pubid>
                  <pubid idtype="pmpid" link="fulltext">2683088</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>A structural taxonomy of DNA-binding domains</p>
            </title>
            <aug>
               <au>
                  <snm>Harrison</snm>
                  <fnm>SC</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1991</pubdate>
            <volume>353</volume>
            <issue>6346</issue>
            <fpage>715</fpage>
            <lpage>719</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/353715a0</pubid>
                  <pubid idtype="pmpid" link="fulltext">1944532</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>An overview of the structures of protein-DNA complexes</p>
            </title>
            <aug>
               <au>
                  <snm>Luscombe</snm>
                  <fnm>NM</fnm>
               </au>
               <au>
                  <snm>Austin</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Berman</snm>
                  <fnm>HM</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2000</pubdate>
            <volume>1</volume>
            <issue>1</issue>
            <fpage>REVIEWS001</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">138832</pubid>
                  <pubid idtype="pmpid" link="fulltext">11104519</pubid>
                  <pubid idtype="doi">10.1186/gb-2000-1-1-reviews001</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Eukaryotic transcriptional regulatory proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Johnson</snm>
                  <fnm>PF</fnm>
               </au>
               <au>
                  <snm>McKnight</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Annu Rev Biochem</source>
            <pubdate>1989</pubdate>
            <volume>58</volume>
            <fpage>799</fpage>
            <lpage>839</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.bi.58.070189.004055</pubid>
                  <pubid idtype="pmpid" link="fulltext">2673023</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Moment-based prediction of DNA-binding proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Ahmad</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sarai</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2004</pubdate>
            <volume>341</volume>
            <issue>1</issue>
            <fpage>65</fpage>
            <lpage>71</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2004.05.058</pubid>
                  <pubid idtype="pmpid" link="fulltext">15312763</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information</p>
            </title>
            <aug>
               <au>
                  <snm>Ahmad</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gromiha</snm>
                  <fnm>MM</fnm>
               </au>
               <au>
                  <snm>Sarai</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <issue>4</issue>
            <fpage>477</fpage>
            <lpage>486</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg432</pubid>
                  <pubid idtype="pmpid" link="fulltext">14990443</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Structure-based prediction of DNA-binding sites on proteins using the empirical preference of electrostatic potential and the shape of molecular surfaces</p>
            </title>
            <aug>
               <au>
                  <snm>Tsuchiya</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kinoshita</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Nakamura</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2004</pubdate>
            <volume>55</volume>
            <issue>4</issue>
            <fpage>885</fpage>
            <lpage>894</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.20111</pubid>
                  <pubid idtype="pmpid" link="fulltext">15146487</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Kernel-based machine learning protocol for predicting DNA-binding proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Bhardwaj</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Langlois</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>Zhao</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <issue>20</issue>
            <fpage>6486</fpage>
            <lpage>6493</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1283538</pubid>
                  <pubid idtype="pmpid" link="fulltext">16284202</pubid>
                  <pubid idtype="doi">10.1093/nar/gki949</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Residue-level prediction of DNA-binding sites and its application on DNA-binding protein predictions</p>
            </title>
            <aug>
               <au>
                  <snm>Bhardwaj</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>FEBS Lett</source>
            <pubdate>2007</pubdate>
            <volume>581</volume>
            <issue>5</issue>
            <fpage>1058</fpage>
            <lpage>1066</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1993824</pubid>
                  <pubid idtype="pmpid" link="fulltext">17316627</pubid>
                  <pubid idtype="doi">10.1016/j.febslet.2007.01.086</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines</p>
            </title>
            <aug>
               <au>
                  <snm>Yu</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Cao</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Cai</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Shi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>J Theor Biol</source>
            <pubdate>2006</pubdate>
            <volume>240</volume>
            <issue>2</issue>
            <fpage>175</fpage>
            <lpage>184</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jtbi.2005.09.018</pubid>
                  <pubid idtype="pmpid" link="fulltext">16274699</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Efficient prediction of nucleic acid binding function from low-resolution protein structures</p>
            </title>
            <aug>
               <au>
                  <snm>Szilagyi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Skolnick</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2006</pubdate>
            <volume>358</volume>
            <issue>3</issue>
            <fpage>922</fpage>
            <lpage>933</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2006.02.053</pubid>
                  <pubid idtype="pmpid" link="fulltext">16551468</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>PSSM-based prediction of DNA binding sites in proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Ahmad</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sarai</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>33</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">550660</pubid>
                  <pubid idtype="pmpid" link="fulltext">15720719</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-6-33</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Using evolutionary and structural information to predict DNA-binding sites on DNA-binding proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Kuznetsov</snm>
                  <fnm>IB</fnm>
               </au>
               <au>
                  <snm>Gou</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hwang</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2006</pubdate>
            <volume>64</volume>
            <issue>1</issue>
            <fpage>19</fpage>
            <lpage>27</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.20977</pubid>
                  <pubid idtype="pmpid" link="fulltext">16568445</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>DISPLAR: an accurate method for predicting DNA-binding sites on protein surfaces</p>
            </title>
            <aug>
               <au>
                  <snm>Tjong</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>HX</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2007</pubdate>
            <volume>35</volume>
            <issue>5</issue>
            <fpage>1465</fpage>
            <lpage>1477</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1865077</pubid>
                  <pubid idtype="pmpid" link="fulltext">17284455</pubid>
                  <pubid idtype="doi">10.1093/nar/gkm008</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity</p>
            </title>
            <aug>
               <au>
                  <snm>Luscombe</snm>
                  <fnm>NM</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2002</pubdate>
            <volume>320</volume>
            <issue>5</issue>
            <fpage>991</fpage>
            <lpage>1009</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0022-2836(02)00571-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">12126620</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Annotating nucleic acid-binding function based on protein structure</p>
            </title>
            <aug>
               <au>
                  <snm>Stawiski</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Gregoret</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Mandel-Gutfreund</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2003</pubdate>
            <volume>326</volume>
            <issue>4</issue>
            <fpage>1065</fpage>
            <lpage>1079</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0022-2836(03)00031-7</pubid>
                  <pubid idtype="pmpid" link="fulltext">12589754</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Amino acid substitution matrices from protein blocks</p>
            </title>
            <aug>
               <au>
                  <snm>Henikoff</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Henikoff</snm>
                  <fnm>JG</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1992</pubdate>
            <volume>89</volume>
            <issue>22</issue>
            <fpage>10915</fpage>
            <lpage>10919</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">50453</pubid>
                  <pubid idtype="pmpid" link="fulltext">1438297</pubid>
                  <pubid idtype="doi">10.1073/pnas.89.22.10915</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Enlarged representative set of protein structures</p>
            </title>
            <aug>
               <au>
                  <snm>Hobohm</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Sander</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Protein Sci</source>
            <pubdate>1994</pubdate>
            <volume>3</volume>
            <issue>3</issue>
            <fpage>522</fpage>
            <lpage>524</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2142698</pubid>
                  <pubid idtype="pmpid" link="fulltext">8019422</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Structure of a DNA-bound Ultrabithorax-Extradenticle homeodomain complex</p>
            </title>
            <aug>
               <au>
                  <snm>Passner</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Ryoo</snm>
                  <fnm>HD</fnm>
               </au>
               <au>
                  <snm>Shen</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Mann</snm>
                  <fnm>RS</fnm>
               </au>
               <au>
                  <snm>Aggarwal</snm>
                  <fnm>AK</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1999</pubdate>
            <volume>397</volume>
            <issue>6721</issue>
            <fpage>714</fpage>
            <lpage>719</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/17833</pubid>
                  <pubid idtype="pmpid" link="fulltext">10067897</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Structure of HoxA9 and Pbx1 bound to DNA: Hox hexapeptide and DNA recognition anterior to posterior</p>
            </title>
            <aug>
               <au>
                  <snm>LaRonde-LeBlanc</snm>
                  <fnm>NA</fnm>
               </au>
               <au>
                  <snm>Wolberger</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Genes Dev</source>
            <pubdate>2003</pubdate>
            <volume>17</volume>
            <issue>16</issue>
            <fpage>2060</fpage>
            <lpage>2072</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">196259</pubid>
                  <pubid idtype="pmpid" link="fulltext">12923056</pubid>
                  <pubid idtype="doi">10.1101/gad.1103303</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Structure of the histone acetyltransferase Hat1: a paradigm for the GCN5-related N-acetyltransferase superfamily</p>
            </title>
            <aug>
               <au>
                  <snm>Dutnall</snm>
                  <fnm>RN</fnm>
               </au>
               <au>
                  <snm>Tafrov</snm>
                  <fnm>ST</fnm>
               </au>
               <au>
                  <snm>Sternglanz</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ramakrishnan</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1998</pubdate>
            <volume>94</volume>
            <issue>4</issue>
            <fpage>427</fpage>
            <lpage>438</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0092-8674(00)81584-6</pubid>
                  <pubid idtype="pmpid" link="fulltext">9727486</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Molecular basis for synergistic transcriptional activation by Oct1 and Sox2 revealed from the solution structure of the 42-kDa Oct1.Sox2.Hoxb1-DNA ternary transcription factor complex</p>
            </title>
            <aug>
               <au>
                  <snm>Williams</snm>
                  <fnm>DC</fnm>
                  <suf>Jr</suf>
               </au>
               <au>
                  <snm>Cai</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Clore</snm>
                  <fnm>GM</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>2004</pubdate>
            <volume>279</volume>
            <issue>2</issue>
            <fpage>1449</fpage>
            <lpage>1457</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.M309790200</pubid>
                  <pubid idtype="pmpid" link="fulltext">14559893</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>MUSTANG: a multiple structural alignment algorithm</p>
            </title>
            <aug>
               <au>
                  <snm>Konagurthu</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Whisstock</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Stuckey</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Lesk</snm>
                  <fnm>AM</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2006</pubdate>
            <volume>64</volume>
            <issue>3</issue>
            <fpage>559</fpage>
            <lpage>574</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.20921</pubid>
                  <pubid idtype="pmpid" link="fulltext">16736488</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>SCOP: a structural classification of proteins database for the investigation of sequences and structures</p>
            </title>
            <aug>
               <au>
                  <snm>Murzin</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Brenner</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Hubbard</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Chothia</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1995</pubdate>
            <volume>247</volume>
            <issue>4</issue>
            <fpage>536</fpage>
            <lpage>540</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1995.0159</pubid>
                  <pubid idtype="pmpid" link="fulltext">7723011</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Amino acid-base interactions: a three-dimensional analysis of protein-DNA interactions at an atomic level</p>
            </title>
            <aug>
               <au>
                  <snm>Luscombe</snm>
                  <fnm>NM</fnm>
               </au>
               <au>
                  <snm>Laskowski</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <issue>13</issue>
            <fpage>2860</fpage>
            <lpage>2874</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">55782</pubid>
                  <pubid idtype="pmpid" link="fulltext">11433033</pubid>
                  <pubid idtype="doi">10.1093/nar/29.13.2860</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Protein-DNA binding specificity predictions with structural models</p>
            </title>
            <aug>
               <au>
                  <snm>Morozov</snm>
                  <fnm>AV</fnm>
               </au>
               <au>
                  <snm>Havranek</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Baker</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Siggia</snm>
                  <fnm>ED</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <issue>18</issue>
            <fpage>5781</fpage>
            <lpage>5798</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1270944</pubid>
                  <pubid idtype="pmpid" link="fulltext">16246914</pubid>
                  <pubid idtype="doi">10.1093/nar/gki875</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Protein structure comparison by alignment of distance matrices</p>
            </title>
            <aug>
               <au>
                  <snm>Holm</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Sander</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1993</pubdate>
            <volume>233</volume>
            <issue>1</issue>
            <fpage>123</fpage>
            <lpage>138</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1993.1489</pubid>
                  <pubid idtype="pmpid" link="fulltext">8377180</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Improved tools for biological sequence comparison</p>
            </title>
            <aug>
               <au>
                  <snm>Pearson</snm>
                  <fnm>WR</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1988</pubdate>
            <volume>85</volume>
            <issue>8</issue>
            <fpage>2444</fpage>
            <lpage>2448</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">280013</pubid>
                  <pubid idtype="pmpid" link="fulltext">3162770</pubid>
                  <pubid idtype="doi">10.1073/pnas.85.8.2444</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Effective protein sequence comparison</p>
            </title>
            <aug>
               <au>
                  <snm>Pearson</snm>
                  <fnm>WR</fnm>
               </au>
            </aug>
            <source>Methods Enzymol</source>
            <pubdate>1996</pubdate>
            <volume>266</volume>
            <fpage>227</fpage>
            <lpage>258</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8743688</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Flexible sequence similarity searching with the FASTA3 program package</p>
            </title>
            <aug>
               <au>
                  <snm>Pearson</snm>
                  <fnm>WR</fnm>
               </au>
            </aug>
            <source>Methods Mol Biol</source>
            <pubdate>2000</pubdate>
            <volume>132</volume>
            <fpage>185</fpage>
            <lpage>219</lpage>
            <xrefbib>
               <pubid idtype="pmpid">10547837</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>The art of matchmaking: sequence alignment methods and their structural implications</p>
            </title>
            <aug>
               <au>
                  <snm>Smith</snm>
                  <fnm>TF</fnm>
               </au>
            </aug>
            <source>Structure</source>
            <pubdate>1999</pubdate>
            <volume>7</volume>
            <issue>1</issue>
            <fpage>R7</fpage>
            <lpage>R12</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0969-2126(99)80003-3</pubid>
                  <pubid idtype="pmpid">10368278</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>From genes to protein structure and function: novel applications of computational approaches in the genomic era</p>
            </title>
            <aug>
               <au>
                  <snm>Skolnick</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Fetrow</snm>
                  <fnm>JS</fnm>
               </au>
            </aug>
            <source>Trends Biotechnol</source>
            <pubdate>2000</pubdate>
            <volume>18</volume>
            <issue>1</issue>
            <fpage>34</fpage>
            <lpage>39</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0167-7799(99)01398-0</pubid>
                  <pubid idtype="pmpid" link="fulltext">10631780</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>A general method applicable to the search for similarities in the amino acid sequence of two proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Needleman</snm>
                  <fnm>SB</fnm>
               </au>
               <au>
                  <snm>Wunsch</snm>
                  <fnm>CD</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1970</pubdate>
            <volume>48</volume>
            <issue>3</issue>
            <fpage>443</fpage>
            <lpage>453</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0022-2836(70)90057-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">5420325</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Identification of common molecular subsequences</p>
            </title>
            <aug>
               <au>
                  <snm>Smith</snm>
                  <fnm>TF</fnm>
               </au>
               <au>
                  <snm>Waterman</snm>
                  <fnm>MS</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1981</pubdate>
            <volume>147</volume>
            <issue>1</issue>
            <fpage>195</fpage>
            <lpage>197</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0022-2836(81)90087-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">7265238</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Hidden Markov models for detecting remote protein homologies</p>
            </title>
            <aug>
               <au>
                  <snm>Karplus</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Barrett</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Hughey</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>1998</pubdate>
            <volume>14</volume>
            <issue>10</issue>
            <fpage>846</fpage>
            <lpage>856</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/14.10.846</pubid>
                  <pubid idtype="pmpid" link="fulltext">9927713</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Basic local alignment search tool</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Gish</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1990</pubdate>
            <volume>215</volume>
            <issue>3</issue>
            <fpage>403</fpage>
            <lpage>410</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">2231712</pubid>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
