<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-8-226</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Recruitment of rare <it>3</it>-grams at functional sites: Is this a mechanism for increasing enzyme specificity?</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Tobi</snm>
               <fnm>Dror</fnm>
               <insr iid="I1"/>
               <email>drt6@pitt.edu</email>
            </au>
            <au id="A2" ca="yes">
               <snm>Bahar</snm>
               <fnm>Ivet</fnm>
               <insr iid="I1"/>
               <email>bahar@ccbb.pitt.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Computational Biology, School of Medicine, University of Pittsburgh, Pittsburgh PA 15261, USA</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>1</issue>
         <fpage>226</fpage>
         <url>http://www.biomedcentral.com/1471-2105/8/226</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17598909</pubid>
               <pubid idtype="doi">10.1186/1471-2105-8-226</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>16</day>
               <month>2</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>28</day>
               <month>6</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>28</day>
               <month>6</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Tobi and Bahar; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>A wealth of unannotated and functionally unknown protein sequences has accumulated in recent years with rapid progresses in sequence genomics, giving rise to ever increasing demands for developing methods to efficiently assess functional sites. Sequence and structure conservations have traditionally been the major criteria adopted in various algorithms to identify functional sites. Here, we focus on the distributions of the 20<sup>3 </sup>different types of <it>3</it>-grams (or triplets of sequentially contiguous amino acid) in the entire space of sequences accumulated to date in the UniProt database, and focus in particular on the rare <it>3</it>-grams distinguished by their high entropy-based information content.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Comparison of the UniProt distributions with those observed near/at the active sites on a non-redundant dataset of 59 enzyme/ligand complexes shows that the active sites preferentially recruit <it>3</it>-grams distinguished by their low frequency in the UniProt. Three cases, Src kinase, hemoglobin, and tyrosyl-tRNA synthetase, are discussed in details to illustrate the biological significance of the results.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>The results suggest that recruitment of rare <it>3</it>-grams may be an efficient mechanism for increasing specificity at functional sites. Rareness/scarcity emerges as a feature that may assist in identifying key sites for proteins function, providing information complementary to that derived from sequence alignments. In addition it provides us (for the first time) with a means of identifying potentially functional sites from sequence information alone, when sequence conservation properties are not available.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="refman"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Proteins perform a variety of biological functions, the efficiency and specificity of which are determined to a large extent by their intrinsic sequence-structure properties. Many biophysical and biochemical activities such as binding specific substrates, catalysis, or allosteric responses involve particular sequence motifs at functional sites. Identification of functional sites, or understanding sequence-to-function mapping, in general, has been a major goal in computational molecular biology and bioinformatics. With rapid accumulation of genome scale sequence data there is an ever increasing need for efficient assessment of potential functional sites.</p>
         <p>Among different criteria adopted to identify functional sites, sequence conservation has probably been the most widely used, based on the fact that functional residues are often conserved across all or the majority of members in a given family of proteins. Algorithms developed for detecting conservation patterns are traditionally based on multiple sequence alignments <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>. Other algorithms use a combination of sequence and structure comparisons provided that structural information is available, which usually yield relatively higher predictive abilities <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. A set of criteria was proposed by Thornton and coworkers for detecting catalytic residues, which was successfully implemented in an algorithm for identifying active sites <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. Other strategies include pattern matching such as TESS <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, FFF <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> and SPASM <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> that locate functional sites by detecting small three-dimensional motifs. Proteins dynamics has been shown to be another property that can be advantageously examined to detect functional sites <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. We have recently shown that the global hinge centers of enzymes co-localize with their catalytic sites, pointing to the importance of coupling between mechanics and chemistry for enzymatic activity <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. In addition, analyzing patterns of communication and shortest paths in protein structures modeled as networks has proven useful in identifying allosteric sites <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>.</p>
         <p>In the present study, we propose another property that can serve as a criterion for detecting functionally important sites: <it>scarcity </it>or <it>rareness </it>in the space of sequence motifs. Motifs composed of three sequential amino acids (<it>3</it>-grams) are considered here as the shortest, yet distinctive, sequence fragments that can provide statistically significant information. The choice of <it>3</it>-grams in the following work is based on preliminary studies as well as our previous work <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> that showed that <it>1</it>-grams and <it>2</it>-grams do not confer enough specificity while there is no major difference in the performance of <it>3</it>-grams and <it>4</it>-grams (due to the fact that some strong signals characteristic of <it>3</it>-grams may be overlooked upon examination of <it>4</it>-grams), while <it>3</it>-grams lend themselves to more accurate statistics. A given <it>3</it>-gram will be termed '<it>rare' </it>if its probabilistic occurrence in the UniProt (Universal Protein Resource) <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> is significantly lower than that expected from an unbiased distribution, as will be explained in quantitative details below.</p>
         <p>A notable feature emerging from this study is the propensity of proteins' active sites to populate rare <it>sequence </it>motif, akin to the notion that rare <it>structural </it>motifs co-localize with functionally important sites <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>. This selectivity suggests that enzymes tend to recruit such distinctive/rare sequences at their active sites to increase their specificity, consistent with the higher information content associated with rare events. We illustrate the functional importance of such rare <it>3</it>-grams in three proteins, C-Src kinase, hemoglobin, and tyrosyl-tRNA synthetase.</p>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <sec>
            <st>
               <p>Distributions of <it>3</it>-grams</p>
            </st>
            <p>A given protein sequence of <it>N </it>residues is viewed as a collection of <it>N &#8211; n </it>+ 1 overlapping words of <it>n </it>letters, termed <it>n</it>-grams (also called <it>n</it>-tuples), composed of <it>n </it>contiguous amino acids along the sequence. For each protein, we consider all <it>3</it>-grams (<it>n </it>= 3) or triplets, by sliding a window of three amino acids along the sequence, thus leading to a total of <it>(N &#8211; 2) </it>triplets. The natural frequency of a given <it>3</it>-gram of type <it>i </it>(1 &#8804; <it>i </it>&#8804; 20<sup>3</sup>) is defined as</p>
            <p>
               <display-formula id="M1">
                  <m:math name="1471-2105-8-226-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msubsup>
                              <m:mi>p</m:mi>
                              <m:mi>i</m:mi>
                              <m:mrow>
                                 <m:mi>u</m:mi>
                                 <m:mi>n</m:mi>
                                 <m:mi>i</m:mi>
                              </m:mrow>
                           </m:msubsup>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:msub>
                                    <m:mi>C</m:mi>
                                    <m:mi>i</m:mi>
                                 </m:msub>
                              </m:mrow>
                              <m:mrow>
                                 <m:mstyle displaystyle="true">
                                    <m:mo>&#8721;</m:mo>
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>C</m:mi>
                                          <m:mi>j</m:mi>
                                       </m:msub>
                                    </m:mrow>
                                 </m:mstyle>
                              </m:mrow>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGWbaCdaqhaaWcbaGaemyAaKgabaGaemyDauNaemOBa4MaemyAaKgaaOGaeyypa0ZaaSaaaeaacqWGdbWqdaWgaaWcbaGaemyAaKgabeaaaOqaamaaqaeabaGaem4qam0aaSbaaSqaaiabdQgaQbqabaaabeqab0GaeyyeIuoaaaaaaa@3C31@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <it>C</it><sub><it>i </it></sub>is the number of occurrence in the UniProt, and the summation is performed over all 20<sup>3 </sup>types of <it>3</it>-grams in the 172,233 sequences compiled as of March 1, 2005, which contain a total number <it>C</it><sub><it>tot </it></sub>= 62,256,868 of <it>3</it>-grams. Each of the 20<sup>3 </sup>distinctive types is represented in UniProt. Their counts vary in the range 170 &lt;<it>C</it><sub><it>i </it></sub>&lt; 67,264, with the lower and upper bounds corresponding to the <it>3</it>-grams MCW and AAA, respectively. Figure <figr fid="F1">1</figr> displays the distribution of the counts <it>C</it><sub><it>i</it></sub>. The inset shows a closer view of the percentage of <it>3</it>-grams types whose UniProt counts fall in successive grids of size &#916; <it>C</it><sub><it>i </it></sub>= 500, for the range <it>C</it><sub><it>i</it></sub>&#8804; 1.5 10<sup>4</sup>. We note that less than 1% of the 20<sup>3 </sup>types have UniProt counts <it>C</it><sub><it>i </it></sub>lower than 500, and 14.97% has <it>C</it><sub><it>i </it></sub>&lt; 2,000. Rare <it>3</it>-grams are defined as those in the lower end of the histogram, located one standard deviation away from the mean. They comprise 6.0% of all 3-gram <it>types</it>, and 0.595% of all 3-grams <it>counts </it>in the UniProt (see Figure <figr fid="F1">1</figr> caption).</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>
                     <it>Distribution of 3-grams in UniProt</it>
                  </p>
               </caption>
               <text>
                  <p><b><it>Distribution of 3-grams in UniProt</it></b>. Histogram of the counts <it>C</it><sub><it>i </it></sub>for all types (1 &#8804; <it>i </it>&#8804; 20<sup>3</sup>) of <it>3</it>-grams, shown for grids of size &#916;<it>C</it><sub><it>i </it></sub>= 10<sup>3</sup>. The mean &lt;<it>C<sub>i</sub></it>> and standard deviation &#963;<sub>C </sub>are 7,822 and 6,682, respectively. The inset shows the portion of the curve for <it>C</it><sub><it>i</it></sub>&#8804; 15,000 in more detail. The ordinate is the percentage of <it>3</it>-grams within intervals of &#916;<it>C</it><sub><it>i </it></sub>= 500. The peak (6.07%) occurs at the interval 2500 &#8804; <it>C<sub>i</sub></it>&#8804; 3000. <it>3</it>-grams that are one standard deviation away from the mean towards lower counts are termed 'rare' <it>3</it>-grams. Their counts are lower than [&lt;<it>C<sub>i</sub></it>> - &#963;<sub>C</sub>] = 1,140. There is a total of 480 such <it>3</it>-grams (i.e. 6% of all the 8,000 types of 3-grams), and their cumulative frequency of occurrence evaluated from the ratio of their total count to <it>C</it><sub><it>tot </it></sub>is 0.595%.</p>
               </text>
               <graphic file="1471-2105-8-226-1"/>
            </fig>
            <p>The unbiased probability of a given <it>3</it>-gram is <it>p</it><sub>0 </sub>= 1/20<sup>3 </sup>= 0.000125, and the corresponding count is <it>C</it><sub>0 </sub>= <it>C</it><sub><it>tot</it></sub>/20<sup>3 </sup>= 7782 in the UniProt. The <it>3</it>-grams whose count is lower than this value by a factor of <it>f </it>= 2 compose 10% of all the counts <it>C</it><sub><it>tot </it></sub>in UniProt, and those lower by <it>f </it>= 3 amount to 4% of <it>C</it><sub><it>tot</it></sub>.</p>
         </sec>
         <sec>
            <st>
               <p>Scarcity scores and enhancement factors</p>
            </st>
            <p>The frequency <inline-formula><m:math name="1471-2105-8-226-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>p</m:mi><m:mi>i</m:mi><m:mrow><m:mi>u</m:mi><m:mi>n</m:mi><m:mi>i</m:mi></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGWbaCdaqhaaWcbaGaemyAaKgabaGaemyDauNaemOBa4MaemyAaKgaaaaa@33D0@</m:annotation></m:semantics></m:math></inline-formula> given by eq (1) will also be termed the 'observed' or 'natural' probability. We will associate with each <it>3</it>-gram type a <it>scarcity score </it>given by</p>
            <p>
               <display-formula id="M2">
                  <m:math name="1471-2105-8-226-i3" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msubsup>
                              <m:mi>s</m:mi>
                              <m:mi>i</m:mi>
                              <m:mrow>
                                 <m:mi>u</m:mi>
                                 <m:mi>n</m:mi>
                                 <m:mi>i</m:mi>
                              </m:mrow>
                           </m:msubsup>
                           <m:mo>=</m:mo>
                           <m:mo>&#8722;</m:mo>
                           <m:mi>ln</m:mi>
                           <m:mo>&#8289;</m:mo>
                           <m:mo stretchy="false">[</m:mo>
                           <m:msubsup>
                              <m:mi>p</m:mi>
                              <m:mi>i</m:mi>
                              <m:mrow>
                                 <m:mi>u</m:mi>
                                 <m:mi>n</m:mi>
                                 <m:mi>i</m:mi>
                              </m:mrow>
                           </m:msubsup>
                           <m:mo stretchy="false">]</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGZbWCdaqhaaWcbaGaemyAaKgabaGaemyDauNaemOBa4MaemyAaKgaaOGaeyypa0JaeyOeI0IagiiBaWMaeiOBa4Maei4waSLaemiCaa3aa0baaSqaaiabdMgaPbqaaiabdwha1jabd6gaUjabdMgaPbaakiabc2faDbaa@4247@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>which provides a measure of the information content each <it>3</it>-gram carries. The distribution of scarcity scores is given in Figure <figr fid="F2">2</figr>. They vary in the range 6.83 &#8804; <inline-formula><m:math name="1471-2105-8-226-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>s</m:mi><m:mi>i</m:mi><m:mrow><m:mi>u</m:mi><m:mi>n</m:mi><m:mi>i</m:mi></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGZbWCdaqhaaWcbaGaemyAaKgabaGaemyDauNaemOBa4MaemyAaKgaaaaa@33D6@</m:annotation></m:semantics></m:math></inline-formula> &#8804; 12.81. Panel A displays the scores sorted in descending order, plotted against <it>3</it>-gram type index, and panel B displays their histogram (distribution among the 20<sup>3 </sup>types), based on grids of size &#916;<inline-formula><m:math name="1471-2105-8-226-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>s</m:mi><m:mi>i</m:mi><m:mrow><m:mi>u</m:mi><m:mi>n</m:mi><m:mi>i</m:mi></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGZbWCdaqhaaWcbaGaemyAaKgabaGaemyDauNaemOBa4MaemyAaKgaaaaa@33D6@</m:annotation></m:semantics></m:math></inline-formula> = 0.25. The arrows display the threshold adopted for defining rare 3-grams.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>
                     <it>Scarcity Scores</it>
                  </p>
               </caption>
               <text>
                  <p><b><it>Scarcity Scores</it></b>. <b>(A) </b>Scarcity scores (eq 2) plotted as a function of <it>3</it>-gram index, sorted in descending order. <b>(B) </b>Histogram of scarcity scores. The abscissa shows the number of 3-grams having scarcity scores values lying in successive ranges of size 0.25 shown along the ordinate. The peak occurs at <inline-formula><m:math name="1471-2105-8-226-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>s</m:mi><m:mi>i</m:mi><m:mrow><m:mi>u</m:mi><m:mi>n</m:mi><m:mi>i</m:mi></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGZbWCdaqhaaWcbaGaemyAaKgabaGaemyDauNaemOBa4MaemyAaKgaaaaa@33D6@</m:annotation></m:semantics></m:math></inline-formula> = 9.0 &#177; 0.125. Unique <it>3</it>-grams (Figure 1) lie in the range <inline-formula><m:math name="1471-2105-8-226-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>s</m:mi><m:mi>i</m:mi><m:mrow><m:mi>u</m:mi><m:mi>n</m:mi><m:mi>i</m:mi></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGZbWCdaqhaaWcbaGaemyAaKgabaGaemyDauNaemOBa4MaemyAaKgaaaaa@33D6@</m:annotation></m:semantics></m:math></inline-formula> > 10.908, or <it>i </it>&lt; 480, indicated by the arrow, and their observed cumulative frequency is <inline-formula><m:math name="1471-2105-8-226-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>p</m:mi><m:mi>i</m:mi><m:mrow><m:mi>u</m:mi><m:mi>n</m:mi><m:mi>i</m:mi></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGWbaCdaqhaaWcbaGaemyAaKgabaGaemyDauNaemOBa4MaemyAaKgaaaaa@33D0@</m:annotation></m:semantics></m:math></inline-formula> &lt; 1.83 10<sup>-5</sup>.</p>
               </text>
               <graphic file="1471-2105-8-226-2"/>
            </fig>
            <p>The 'expected' probability of <it>3</it>-gram <it>i </it>= {XYZ} on the other hand, based on the natural occurrences of individual amino acids, assuming the probabilistic occurrence of amino acids to be independent of their sequential neighbors, is</p>
            <p>
               <display-formula id="M3">
                  <m:math name="1471-2105-8-226-i5" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msubsup>
                              <m:mi>p</m:mi>
                              <m:mi>i</m:mi>
                              <m:mrow>
                                 <m:mi>e</m:mi>
                                 <m:mi>x</m:mi>
                                 <m:mi>p</m:mi>
                              </m:mrow>
                           </m:msubsup>
                           <m:mo>=</m:mo>
                           <m:msup>
                              <m:mi>p</m:mi>
                              <m:mn>0</m:mn>
                           </m:msup>
                           <m:mi>X</m:mi>
                           <m:msup>
                              <m:mi>p</m:mi>
                              <m:mn>0</m:mn>
                           </m:msup>
                           <m:mi>Y</m:mi>
                           <m:msup>
                              <m:mi>p</m:mi>
                              <m:mn>0</m:mn>
                           </m:msup>
                           <m:mi>Z</m:mi>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGWbaCdaqhaaWcbaGaemyAaKgabaacbiGae8xzauMae8hEaGNae8hCaahaaOGaeyypa0JaemiCaa3aaWbaaSqabeaacqaIWaamaaGccqWGybawcqWGWbaCdaahaaWcbeqaaiabicdaWaaakiabdMfazjabdchaWnaaCaaaleqabaGaeGimaadaaOGaemOwaOfaaa@403C@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <it>p</it><sup>0 </sup><it>X </it>is the natural frequency of amino acid of type X (for details see Table S1 in Additional file <supplr sid="S1">1</supplr>). The enrichment in a given <it>3</it>-gram count will be described by the enhancement factor</p>
            <suppl id="S1">
               <title>
                  <p>Additional File 1</p>
               </title>
               <text>
                  <p><b>Supplementary tables S1-S7</b>. Figures and material.</p>
               </text>
               <file name="1471-2105-8-226-S1.doc">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>
               <display-formula id="M4">
                  <m:math name="1471-2105-8-226-i6" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>a</m:mi>
                              <m:mi>i</m:mi>
                           </m:msub>
                           <m:mo>=</m:mo>
                           <m:msubsup>
                              <m:mi>p</m:mi>
                              <m:mi>i</m:mi>
                              <m:mrow>
                                 <m:mi>u</m:mi>
                                 <m:mi>n</m:mi>
                                 <m:mi>i</m:mi>
                              </m:mrow>
                           </m:msubsup>
                           <m:mo>/</m:mo>
                           <m:msubsup>
                              <m:mi>p</m:mi>
                              <m:mi>i</m:mi>
                              <m:mrow>
                                 <m:mi>e</m:mi>
                                 <m:mi>x</m:mi>
                                 <m:mi>p</m:mi>
                              </m:mrow>
                           </m:msubsup>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGHbqydaWgaaWcbaGaemyAaKgabeaakiabg2da9iabdchaWnaaDaaaleaacqWGPbqAaeaacqWG1bqDcqWGUbGBcqWGPbqAaaGccqGGVaWlcqWGWbaCdaqhaaWcbaGaemyAaKgabaacbiGae8xzauMae8hEaGNae8hCaahaaaaa@3FC7@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>Figure <figr fid="F3">3A</figr> compares the observed and expected counts of <it>3</it>-grams. While the slope gives an average enhancement factor &lt;<it>a</it><sub><it>i</it></sub>> = 1.02 with a correlation coefficient of 0.96, a more detailed examination of expected and observed <it>3</it>-grams Figure <figr fid="F3">3C</figr> shows that the two distributions exhibit statistically significant departure (&#967;<sup>2</sup>, <it>P &lt; 0.001)</it>. The expected histogram is more broadly distributed than that observed. Some <it>3</it>-grams that show substantial departures from their expected frequencies (i.e. enhancement factor <it>a</it><sub><it>i</it></sub>&#8800; 1) are labeled in panel B. Among over-represented rare <it>3</it>-grams we observe CCC, WWN, WYW, CWC, CHC and WWW. We also note that AAA and LLL are distinguished by their high <inline-formula><m:math name="1471-2105-8-226-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>p</m:mi><m:mi>i</m:mi><m:mrow><m:mi>u</m:mi><m:mi>n</m:mi><m:mi>i</m:mi></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGWbaCdaqhaaWcbaGaemyAaKgabaGaemyDauNaemOBa4MaemyAaKgaaaaa@33D0@</m:annotation></m:semantics></m:math></inline-formula>, and HHH and QQQ by their enhancement <it>a</it><sub><it>i </it></sub>&#8805; 6. The HHH enrichment may, however, arise from tagged histidines, rather than reflecting a natural preference.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>
                     <it>Observed vs expected counts of 3-grams</it>
                  </p>
               </caption>
               <text>
                  <p><b><it>Observed vs expected counts of 3-grams</it></b>. <b>(A) </b>Observed counts (<it>C</it><sub><it>i</it></sub>) are directly retrieved for all 20<sup>3 </sup>types of <it>3</it>-grams from the UniProt. Expected counts are based on the natural frequencies of individual amino acids (eq 3). Observed counts (<it>C</it><sub><it>i</it></sub>) usually match the expected ones (slope of best fitting line is 1.02), although particular <it>3</it>-grams (labeled) are enhanced. The correlation coefficient between observed and expected counts is 0.96. The enrichment <it>a</it><sub><it>i </it></sub>in particular <it>3</it>-grams is shown in panel B. The observed counts are plotted in a log-log scale, to provide a clearer view of the subset of unique <it>3</it>-grams (to the left of the arrow). <b>(B) </b>Comparison of observed histogram of 3-grams (Figure 1) and that expected from the independent frequencies of occurrences of amino acids in each triplet. The two histograms are significantly different (&#967;<sup>2</sup>, <it>P &lt; 0.001)</it>.</p>
               </text>
               <graphic file="1471-2105-8-226-3"/>
            </fig>
            <p>The present approach permits us to assign a scarcity score to each amino acid in a given sequence without recourse to sequence alignment. In the calculations below we have performed similar analyses for protein families, where the occurrence of rare <it>3</it>-grams at specific locations where examined with regard to their conservation properties. Given an alignment of homologues sequences, an average scarcity score was assigned to each position <it>j </it>along a sequence <it>k</it>, based on the minimum value, <inline-formula><m:math name="1471-2105-8-226-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msup><m:mi>p</m:mi><m:mrow><m:mi>u</m:mi><m:mi>n</m:mi><m:mi>i</m:mi></m:mrow></m:msup><m:msub><m:mo>|</m:mo><m:mrow><m:mi>j</m:mi><m:mi>k</m:mi></m:mrow></m:msub><m:mo>=</m:mo><m:mi>m</m:mi><m:mi>i</m:mi><m:mi>n</m:mi><m:msub><m:mrow><m:mo>{</m:mo><m:msubsup><m:mi>p</m:mi><m:mrow><m:mtext>&#160;&#160;&#160;&#160;a</m:mtext></m:mrow><m:mrow><m:mtext>uni</m:mtext></m:mrow></m:msubsup><m:mo>,</m:mo><m:msubsup><m:mi>p</m:mi><m:mrow><m:mtext>&#160;&#160;&#160;&#160;b</m:mtext></m:mrow><m:mrow><m:mtext>uni</m:mtext></m:mrow></m:msubsup><m:msubsup><m:mi>p</m:mi><m:mrow><m:mtext>&#160;&#160;&#160;&#160;c</m:mtext></m:mrow><m:mrow><m:mtext>uni</m:mtext></m:mrow></m:msubsup><m:mo>}</m:mo></m:mrow><m:mrow><m:mi>j</m:mi><m:mi>k</m:mi></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegyvzYrwyUfgarqqtubsr4rNCHbGeaGqiA8vkIkVAFgIELiFeLkFeLk=iY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqVepeea0=as0db9vqpepesP0xe9Fve9Fve9GapdbaqaaeGacaGaaiaabeqaamqadiabaaGcbaGaemiCaa3aaWbaaSqabeaacqWG1bqDcqWGUbGBcqWGPbqAaaGccqGG8baFdaWgaaWcbaGaemOAaOMaem4AaSgabeaakiabg2da9GWaciaa=1gacaWFPbGaa8NBaiabcUha7jabdchaWnaaDaaaleaacqqGGaaicqqGGaaicqqGGaaicqqGGaaicqqGHbqyaeaacqqG1bqDcqqGUbGBcqqGPbqAaaGccqGGSaalcqWGWbaCdaqhaaWcbaGaeeiiaaIaeeiiaaIaeeiiaaIaeeiiaaIaeeOyaigabaGaeeyDauNaeeOBa4MaeeyAaKgaaOGaemiCaa3aa0baaSqaaiabbccaGiabbccaGiabbccaGiabbccaGiabbogaJbqaaiabbwha1jabb6gaUjabbMgaPbaakiabc2ha9naaBaaaleaacqWGQbGAcqWGRbWAaeqaaaaa@706E@</m:annotation></m:semantics></m:math></inline-formula>, of the three consecutive <it>3</it>-grams of types (<it>a</it>, <it>b</it>, <it>c</it>) that contain residue <it>j</it>, using</p>
            <p>
               <display-formula id="M5">
                  <m:math name="1471-2105-8-226-i8" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mo>&lt;</m:mo>
                           <m:msubsup>
                              <m:mi>s</m:mi>
                              <m:mi>j</m:mi>
                              <m:mrow>
                                 <m:mi>u</m:mi>
                                 <m:mi>n</m:mi>
                                 <m:mi>i</m:mi>
                              </m:mrow>
                           </m:msubsup>
                           <m:mo>></m:mo>
                           <m:mo>=</m:mo>
                           <m:mo>&#8722;</m:mo>
                           <m:mi>ln</m:mi>
                           <m:mo>&#8289;</m:mo>
                           <m:mo stretchy="false">[</m:mo>
                           <m:mo>&lt;</m:mo>
                           <m:msubsup>
                              <m:mi>p</m:mi>
                              <m:mi>j</m:mi>
                              <m:mrow>
                                 <m:mi>u</m:mi>
                                 <m:mi>n</m:mi>
                                 <m:mi>i</m:mi>
                              </m:mrow>
                           </m:msubsup>
                           <m:mo>></m:mo>
                           <m:mo stretchy="false">]</m:mo>
                           <m:mo>=</m:mo>
                           <m:mo>&#8722;</m:mo>
                           <m:mi>ln</m:mi>
                           <m:mo>&#8289;</m:mo>
                           <m:mo stretchy="false">[</m:mo>
                           <m:msup>
                              <m:mi>m</m:mi>
                              <m:mrow>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mn>1</m:mn>
                              </m:mrow>
                           </m:msup>
                           <m:msub>
                              <m:mi>&#931;</m:mi>
                              <m:mi>k</m:mi>
                           </m:msub>
                           <m:msup>
                              <m:mi>p</m:mi>
                              <m:mrow>
                                 <m:mi>u</m:mi>
                                 <m:mi>n</m:mi>
                                 <m:mi>i</m:mi>
                              </m:mrow>
                           </m:msup>
                           <m:msub>
                              <m:mo>|</m:mo>
                              <m:mrow>
                                 <m:mi>j</m:mi>
                                 <m:mi>k</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mo stretchy="false">]</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqGH8aapcqWGZbWCdaqhaaWcbaGaemOAaOgabaGaemyDauNaemOBa4MaemyAaKgaaOGaeyOpa4Jaeyypa0JaeyOeI0IagiiBaWMaeiOBa4Maei4waSLaeyipaWJaemiCaa3aa0baaSqaaiabdQgaQbqaaiabdwha1jabd6gaUjabdMgaPbaakiabg6da+iabc2faDjabg2da9iabgkHiTiGbcYgaSjabc6gaUjabcUfaBjabd2gaTnaaCaaaleqabaGaeyOeI0IaeGymaedaaOGaeu4Odm1aaSbaaSqaaiabdUgaRbqabaGccqWGWbaCdaahaaWcbeqaaiabdwha1jabd6gaUjabdMgaPbaakiabcYha8naaBaaaleaacqWGQbGAcqWGRbWAaeqaaOGaeiyxa0faaa@5E71@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where the summation is performed over the homologous sequences (1 &#8804; <it>k </it>&#8804; <it>m</it>) in the alignment.</p>
         </sec>
         <sec>
            <st>
               <p>Proteins tend recruit rare <it>3</it>-grams at their active sites</p>
            </st>
            <p>The biological significance of rare <it>3</it>-grams has been examined by analyzing the results with regard to the dataset of 59 non-redundant enzyme/ligand complexes compiled by Gutteridge and Thornton <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>, shortly referred to as the GT dataset (Table <tblr tid="T1">1</tblr>). Two non-overlapping subsets of <it>3</it>-grams were retrieved from the GT dataset. The first is composed of all <it>3</it>-grams located at the active sites, and the second includes all other residues. The two subsets comprise 3,514 and 17,450 <it>3</it>-grams, respectively. Active site <it>3</it>-grams are defined as those containing at least one heavy atom located within 5&#197; from the ligand.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>GT (Gutteridge-Thornton, 2005)<sup>24 </sup>dataset of 59 enzymes*</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="left">
                        <p>1cib</p>
                     </c>
                     <c ca="left">
                        <p>1kbqAC</p>
                     </c>
                     <c ca="left">
                        <p>6aldA</p>
                     </c>
                     <c ca="left">
                        <p>1n2uA</p>
                     </c>
                     <c ca="left">
                        <p>1o7n</p>
                     </c>
                     <c ca="left">
                        <p>3apr</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>8acn</p>
                     </c>
                     <c ca="left">
                        <p>1de6A</p>
                     </c>
                     <c ca="left">
                        <p>1d6sA</p>
                     </c>
                     <c ca="left">
                        <p>1cml</p>
                     </c>
                     <c ca="left">
                        <p>1oaf</p>
                     </c>
                     <c ca="left">
                        <p>6cel</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1a16</p>
                     </c>
                     <c ca="left">
                        <p>3daa</p>
                     </c>
                     <c ca="left">
                        <p>1fs5A</p>
                     </c>
                     <c ca="left">
                        <p>1eexABC</p>
                     </c>
                     <c ca="left">
                        <p>1pnf</p>
                     </c>
                     <c ca="left">
                        <p>1jkxA</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1b66A</p>
                     </c>
                     <c ca="left">
                        <p>1b57</p>
                     </c>
                     <c ca="left">
                        <p>1jcr</p>
                     </c>
                     <c ca="left">
                        <p>2tlx</p>
                     </c>
                     <c ca="left">
                        <p>1dz8A</p>
                     </c>
                     <c ca="left">
                        <p>2dhc</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1b74</p>
                     </c>
                     <c ca="left">
                        <p>1e7y</p>
                     </c>
                     <c ca="left">
                        <p>1gai</p>
                     </c>
                     <c ca="left">
                        <p>1eixAB</p>
                     </c>
                     <c ca="left">
                        <p>1ibv</p>
                     </c>
                     <c ca="left">
                        <p>4reqAB</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1m7yA</p>
                     </c>
                     <c ca="left">
                        <p>1dzt</p>
                     </c>
                     <c ca="left">
                        <p>1ggf</p>
                     </c>
                     <c ca="left">
                        <p>1lee</p>
                     </c>
                     <c ca="left">
                        <p>1cq1A</p>
                     </c>
                     <c ca="left">
                        <p>2tsc</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1lruA</p>
                     </c>
                     <c ca="left">
                        <p>1k9sAD</p>
                     </c>
                     <c ca="left">
                        <p>1gm7AB</p>
                     </c>
                     <c ca="left">
                        <p>1erm</p>
                     </c>
                     <c ca="left">
                        <p>1e8gA</p>
                     </c>
                     <c ca="left">
                        <p>1oneA</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1dag</p>
                     </c>
                     <c ca="left">
                        <p>1uae</p>
                     </c>
                     <c ca="left">
                        <p>1gkm</p>
                     </c>
                     <c ca="left">
                        <p>1hduA</p>
                     </c>
                     <c ca="left">
                        <p>1lmt</p>
                     </c>
                     <c ca="left">
                        <p>1ra2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1hm2A</p>
                     </c>
                     <c ca="left">
                        <p>1esd</p>
                     </c>
                     <c ca="left">
                        <p>1o6g</p>
                     </c>
                     <c ca="left">
                        <p>1ma0</p>
                     </c>
                     <c ca="left">
                        <p>1bd0</p>
                     </c>
                     <c ca="left">
                        <p>1ldm</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1esw</p>
                     </c>
                     <c ca="left">
                        <p>1dud</p>
                     </c>
                     <c ca="left">
                        <p>1e9gA</p>
                     </c>
                     <c ca="left">
                        <p>1mka</p>
                     </c>
                     <c ca="left">
                        <p>1g49B</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>* The table lists the PDB code and chains used. Care was taken not to double count <it>3</it>-grams in cases where the active site is located at the interface of two identical subunits.</p>
               </tblfn>
            </tbl>
            <p>The UniProt frequencies of these two groups of <it>3</it>-grams yielded the histograms displayed in Figure <figr fid="F4">4A</figr>. To obtain these histograms, the ensemble of <it>3</it>-grams in the respective subsets 1 and 2 of active site residues (gray) and other residues (black) have been organized into groups of different UniProt frequencies, and the counts of <it>3</it>-grams in each group have been determined. The abscissa shows the ranges of UniProt counts, divided into intervals of &#916; <it>C</it><sub><it>i </it></sub>= 1000, starting from <it>C</it><sub><it>i</it></sub>&#8804; 1,000 (labeled 1000), and the ordinate represents the population (or probability) of each group. Not all 20<sup>3 </sup>types of <it>3</it>-grams were represented in this set of 59 enzymes.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>
                     <it>Comparison of natural occurrences of 3-grams</it>
                  </p>
               </caption>
               <text>
                  <p><b><it>Comparison of natural occurrences of 3-grams</it></b>. Two subsets of <it>3</it>-grams are compared <b>(A) </b>those located at (1) active sites of enzymes vs. (2) other positions and <b>(B) </b>those located at (1) active sites of enzymes vs. (2) exposed ones (see Methods). The 3-grams in an ensemble of 59 non-redundant enzyme/ligand complexes [24] were analyzed with regard to their natural frequencies of occurrence in the UniProt. The abscissa shows the number of occurrence of the examined <it>3</it>-grams in UniProt, divided into intervals of size &#916;<it>Ci = </it>1,000, and the ordinate shows the fraction of <it>3</it>-grams fall in the successive count intervals, for the two subsets of 3-grams: active sites (gray) and others/exposed (black). Comparison of the two histograms reveals the higher proportion of rare <it>3</it>-grams in the active sites. In particular the range <it>C</it><sub><it>i </it></sub>&lt; 1,140 (rare <it>3</it>-grams) shows a propensity for active site <it>3</it>-grams that is enhanced by a factor of 2.67 compared to other <it>3</it>-grams and by a factor of 5.4 compared to exposed <it>3</it>-grams. Chi square test resulted with P &lt;&lt; 0.001 for both histograms indicating the statistical significance of the differences.</p>
               </text>
               <graphic file="1471-2105-8-226-4"/>
            </fig>
            <p>Comparison of the two histograms immediately reveals the tendency of the active site <it>3</it>-grams to populate the lower UniProt counts. For example, let us consider the rare <it>3</it>-grams (with <it>C</it><sub><it>i </it></sub>&lt; 1,140 or <inline-formula><m:math name="1471-2105-8-226-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>s</m:mi><m:mi>i</m:mi><m:mrow><m:mi>u</m:mi><m:mi>n</m:mi><m:mi>i</m:mi></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGZbWCdaqhaaWcbaGaemyAaKgabaGaemyDauNaemOBa4MaemyAaKgaaaaa@33D6@</m:annotation></m:semantics></m:math></inline-formula> > 10,908; see Methods and Figure <figr fid="F2">2</figr>). 11.67% of active site <it>3</it>-grams fall in this range, whereas the percentage drops to 4.35% when <it>3</it>-grams at other regions are considered. Therefore, rare <it>3</it>-grams are over-represented in the active sites by a factor of 2.67. The difference between the two distributions shown in panel A is statistically significant (&#967;<sup>2</sup>, <it>P &lt;&lt; 0.001)</it>.</p>
            <p>Similarly, we compared the histogram obtained for the subset of active site <it>3</it>-grams with that calculated for solvent-exposed <it>3</it>-grams (see Methods). Figure <figr fid="F4">4B</figr> shows the histograms for active site (gray) and exposed (black) residues. The active sites are again distinguished by the propensity of rare <it>3</it>-grams. Rare 3-grams are 5.4 times more frequent at active sites compared to exposed regions (&#967;<sup>2</sup>, <it>P &lt;&lt; 0.001) </it>. The difference between the two histograms is more pronounced than that observed in panel A.</p>
            <p>Next, we examined the identity of <it>3</it>-grams at the active sites, and focused in particular on the rare <it>3</it>-grams that are recruited. Given that not all of the 20<sup>3 </sup>types of <it>3</it>-grams were represented in the GT dataset, not all types of <it>3</it>-grams might be either seen at the active sites. We first analyzed the amino acid composition of active sites. For each amino acid, the probabilities of being located at the active site versus elsewhere were calculated. The ratio between these two probabilities, termed the <it>active site propensity</it>, is shown in Figure <figr fid="F5">5A</figr>. Four amino acids, Cys, His, Gly, and Trp, are distinguished by their high active site propensities. This is in agreement with previous work <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> where His and Cys were shown to have the highest propensity for serving as catalytic residues via their side chains, while Gly has the highest propensity of catalyzing reactions via its main chain. We also note that, in the present study, Arg and Glu exhibit relatively low tendencies to locate at active sites, while these two amino acids were reported by Bartlett <it>et al </it><abbrgrp><abbr bid="B25">25</abbr></abbrgrp> to have relatively high catalytic propensities. This difference presumably arises from the different definitions adopted for active sites in the two studies: we refer to all amino acids located in the neighborhood of the ligand, while Bartlett <it>et al </it><abbrgrp><abbr bid="B25">25</abbr></abbrgrp> referred to residues directly involved in <it>catalytic </it>activity.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>
                     <it>Active site propensity of amino acids and comparison with their natural frequencies</it>
                  </p>
               </caption>
               <text>
                  <p><b><it>Active site propensity of amino acids and comparison with their natural frequencies</it></b>. <b>(A) </b>For each type of amino acids, the frequencies of occurrence in the active and non-active sites were calculated. The ratio between them is defined as the active site propensity (ordinate). <b>(B) </b>Natural (UniProt) frequency of occurrence vs. active site propensity for each type of amino acid. Less frequent amino acids tend to have a higher active site propensity, and <it>vice versa</it>, in accord with the histograms in Figure 4. This effect becomes more pronounced at the level of <it>3</it>-grams.</p>
               </text>
               <graphic file="1471-2105-8-226-5"/>
            </fig>
            <p>Cys and His, two residues that have the highest active site propensity (both in terms of their ligand-binding and catalytic activities) are among the least frequent amino acids in terms of their natural occurrence. Figure <figr fid="F5">5A</figr> shows the relation between the frequencies of amino acids in UniProt and their active site propensities. While the data are scattered, there is a discernible anti-correlation between the two sets of data, consistent with Figure <figr fid="F4">4</figr>; rare amino acids have a higher tendency to be in active sites. Clearly this preference becomes more pronounced when <it>3</it>-grams, as opposed to monograms are examined, as a can be seen from the skewed distribution of the frequency of active site residues towards lower counts in Figure <figr fid="F4">4</figr>.</p>
            <p>Rare <it>3</it>-grams may arise from three possibilities: (1) these may be words in which the individual letters have low <it>natural </it>frequencies themselves (i.e. <it>a</it><sub><it>i </it></sub>&#8776; 1), (2) these may be rare 'words', i.e. rare combinations of particular amino acids (<it>a</it><sub><it>i </it></sub>&lt;&lt; 1); or (3) these may be rare despite their enhancement (<it>a</it><sub><it>i </it></sub>>>1) due to the extremely low natural frequencies of the individual amino acids that compose them. Counterparts of Figure <figr fid="F3">3</figr> plotted for the active site <it>3</it>-grams and other <it>3</it>-grams showed that both groups fall in the first category (<it>a</it><sub><it>i </it></sub>&#8776; 1). Their observed counts plotted against the expected ones yield a slope of 1.075 &#177; 0.008. The rare <it>3</it>-grams belonging to the two groups, on the other hand, exhibited a slight enhancement of <it>a</it><sub><it>i </it></sub>= 1.11 and 1.07, for active site and other residues, respectively. Thus, <it>3</it>-grams recruited at active sites are usually combinations of residues which already have relatively low natural frequencies of occurrence. Yet, particular <it>3</it>-grams exhibit significant departure from their expected occurrence probabilities, e.g. SSS, GGG, RRR, GSG (over-represented), and LEL (under-represented). See Table S2 and S3 in Additional files for details. Among rare <it>3</it>-grams at active sites, also exhibiting a large enhancement in the UniProt (case 3, above), we distinguish WWH and HCW.</p>
         </sec>
         <sec>
            <st>
               <p>Scarce <it>3</it>-grams in the GT dataset</p>
            </st>
            <p>Scarcity scores were calculated for all <it>3</it>-grams in each of the 59 proteins in the GT dataset. The postulate that rare <it>3</it>-grams (above a threshold scarcity score) preferentially locate in active sites was examined for a series of threshold scores. For each enzyme, the rates of true positives (TP) and false positives (FP) were evaluated as a function of threshold scarcity scores, TP rates being defined as the fraction of active site 3-grams that exhibit scarcity score above the threshold value. Table <tblr tid="T2">2</tblr> shows the average ratios between the rates of TPs and FPs for seven different thresholds. The uppermost threshold (s<sub>max </sub>= 11) shows more than threefold enhancement in the rate of TPs relative to FPs, while the discriminative power of the scarcity score vanishes as we lower the threshold. A detailed list of TP and FP rates along with the top ranking <it>3</it>-grams are provided for each enzyme in the Additional files Table S4.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Results for the GT dataset</p>
               </caption>
               <tblbdy cols="2">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Scarcity Score Threshold</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>TP/FP*</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>11.0</p>
                     </c>
                     <c ca="left">
                        <p>3.152</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>10.5</p>
                     </c>
                     <c ca="left">
                        <p>1.936</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>10.0</p>
                     </c>
                     <c ca="left">
                        <p>1.543</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>9.5</p>
                     </c>
                     <c ca="left">
                        <p>1.432</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>9.0</p>
                     </c>
                     <c ca="left">
                        <p>1.353</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>8.5</p>
                     </c>
                     <c ca="left">
                        <p>1.139</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>8.0</p>
                     </c>
                     <c ca="left">
                        <p>1.041</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>* The average ratio between true positives (TP) rates and false positives (FP) rates for the dataset of 59 enzymes</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Illustrative results for three cases</p>
            </st>
            <p>The functional role of rare <it>3</it>-grams at proteins' functional sites is illustrated here for three test cases, a Src tyrosine kinase, hemoglobin and a tyrosyl-tRNA synthetase. For each case, a set of homologous proteins has been compiled and aligned, and the scarcity score &lt;<inline-formula><m:math name="1471-2105-8-226-i9" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>s</m:mi><m:mi>j</m:mi><m:mrow><m:mi>u</m:mi><m:mi>n</m:mi><m:mi>i</m:mi></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGZbWCdaqhaaWcbaGaemOAaOgabaGaemyDauNaemOBa4MaemyAaKgaaaaa@33D8@</m:annotation></m:semantics></m:math></inline-formula>> of amino acid(s) at each position has been calculated. The analysis reveals that stretches of rare amino acids are involved in functional roles even beyond ligand binding sites.</p>
            <sec>
               <st>
                  <p>Src kinase</p>
               </st>
               <p>The Src family of non-receptor tyrosine kinases is a member of the tyrosine kinases superfamily and plays an important role in cell differentiation, proliferation and survival <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. C-Src, a member of Src protein kinase family, contains two peptide binding modules (SH3 and SH2 domains), a catalytic tyrosine kinase domain (composed of N- and C-lobes) and a C-terminal regulatory tail (for a review see <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>) (Figure <figr fid="F6">6A</figr>). Its phosphorylation at activation loop residue Y527 results in intramolecular interaction between the SH2 domain and the phosphorylated C-terminal tail, triggering a transition into an inactive conformation with the SH2 and SH3 domains moving away from the catalytic site clamping the latter in a strained conformation. Dephosphorylation of Y527 and phosphorylation of Y416, on the other hand, activate the kinase via movement of the activation loop and opening of the catalytic cleft between the N- and the C- lobes of the kinase domain.</p>
               <fig id="F6">
                  <title>
                     <p>Figure 6</p>
                  </title>
                  <caption>
                     <p>
                        <it>(A) Identification of c-Src unique n-grams</it>
                     </p>
                  </caption>
                  <text>
                     <p><b><it>(A) Identification of c-Src unique n-grams</it></b>. c-Src inactive conformation (PDB:2SRC [58]) is shown as a ribbon diagram (white), the AMP-PNP as a stick model (orange), the phosphotyrosines Y416 and Y527 in space-filling representation, and SH2-kinase domain linker (wheat). The most unique residues are shown as stick model along with their side chains. All ribbon diagrams are drawn using Pymol [59]. <b>(B) C-Src scarcity scores</b>. 111 human Src homologous proteins (E value &lt; 6.10<sup>-57</sup>) were used in calculations. The abscissa represents the residue index and the ordinate the scarcity score. Filled circles refer to top ranking 32 amino acids colored in panel A, most of which are functional and conserved despite being highly rare in the UniProt. <b>(C) PROSITE motifs</b>. Scanning the c-Src sequence against the PROSITE database reveals two recognized motifs (magenta) corresponding to the ATP binding site and the catalytic loop.</p>
                  </text>
                  <graphic file="1471-2105-8-226-6"/>
               </fig>
               <p>Figure <figr fid="F6">6B</figr> shows the scarcity scores &lt;<inline-formula><m:math name="1471-2105-8-226-i9" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>s</m:mi><m:mi>j</m:mi><m:mrow><m:mi>u</m:mi><m:mi>n</m:mi><m:mi>i</m:mi></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGZbWCdaqhaaWcbaGaemOAaOgabaGaemyDauNaemOBa4MaemyAaKgaaaaa@33D8@</m:annotation></m:semantics></m:math></inline-formula>> as a function of residue number along the c-Src sequence, based on UniProt counts of <it>3</it>-grams, averaged over the set of <it>m </it>aligned homologous sequences (eq 5). Six <it>n</it>-grams (3 &#8804; <it>n </it>&#8804; 6) composed of contiguous <it>3</it>-grams are distinguished by their high scores, D<sub>117</sub>WWL<sub>120</sub>, E<sub>146</sub>EWYF<sub>150</sub>, Y<sub>382</sub>VH<sub>384</sub>, I<sub>426</sub>KWTA<sub>430</sub>, D<sub>444</sub>VWSF<sub>448</sub>, and C<sub>496</sub>QCWRK<sub>501</sub>, in addition to individual residues H319, K200, C478 and P485.</p>
               <p>D<sub>117</sub>WWL<sub>120 </sub>is located in the SH3 domain, and interacts with the linker that connects the SH2 and kinase domains when the kinase adopts its inactive conformation (Figure <figr fid="F6">6A</figr>). The SH3-linker interactions have been pointed out to serve as an independent mode of regulation for Hck, a Src family member <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. In addition, Trp118, at the ligand-binding site of SH3 <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>, was found to be essential to the stability of the SH3-VSL12 complex <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>.</p>
               <p>The second stretch of amino acids, E<sub>146</sub>EWYF<sub>150</sub>, coincides with the SH2-SH3 linker which was shown by Kuriyan and coworkers <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> to work as an inducible "snap-lock" that clamps the SH2 and SH3 domains upon phosphorylation of Y572. Mutations of residues in the linker have been shown to lead to a constitutive activation of c-Src <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>.</p>
               <p>The third stretch, Y<sub>382</sub>VH<sub>384</sub>, forms the Src catalytic loop. Y<sub>382 </sub>takes part in key interactions that stabilize the kinase domain in the inactive conformation <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> and H<sub>384 </sub>belongs to the conserved HRD motif of the catalytic loop. Lys200 at SH2 domain interface interacts with the C-terminal regulatory tail.</p>
               <p>The other three stretches of amino acids I<sub>426</sub>KWTA<sub>430</sub>, D<sub>444</sub>VWSF<sub>448</sub>, and C<sub>496</sub>QCWRK<sub>501 </sub>form a cluster located at and beneath the substrate binding region (Figure <figr fid="F6">6A</figr>). The portion Q<sub>497</sub>CWR<sub>500 </sub>is composed of two overlapping <it>3</it>-grams that have the highest scarcity scores in the protein. Notably, the central residues in these stretches, W499, C498, <b>W428</b>, <b>W446</b>, are highly or fully (bold) conserved, lending support to their critical role.</p>
               <p>A systematic analysis of amino acid conservation conducted for c-Src protein family members demonstrated that amino acids distinguished by their high scarcity scores tend to be conserved among the family members (see Conclusion), despite their being rare in the UniProt. Conserved sites distinguished by their scarcity include, in addition to the four residues listed above, W148, K427, T429, F448, S447, A430, H319, <b>H384</b>, C487, <b>D444 </b>and V445. See Additional files Table S5 for details.</p>
               <p>Figure <figr fid="F6">6C</figr> shows the two motifs derived for Src kinase from PROSITE database <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. The first is the stretch of twenty two amino acids L<sub>273</sub>GQGCFG ... GTTRVAIK<sub>295 </sub>corresponding to the ATP binding region and the second is the stretch Y<sub>382</sub>VHRDLRAANILV<sub>394 </sub>corresponding to the catalytic loop. Except for the <it>3</it>-gram Y<sub>382</sub>VH<sub>384 </sub>identified in both panels A and C, the 3-grams deduced from scarcity score and PROSITE analyses appear to provide complementary information.</p>
            </sec>
            <sec>
               <st>
                  <p>Hemoglobin</p>
               </st>
               <p>Hemoglobin (Hb) is a tetramer composed of two &#945;- and two &#946;-subunits (&#945;<sub>1</sub>, &#945;<sub>2</sub>, &#946;<sub>1 </sub>and &#946;<sub>2</sub>) organized in two dimers, &#945;<sub>1</sub>&#946;<sub>1 </sub>and &#945;<sub>2</sub>&#946;<sub>2</sub>, symmetrically positioned around a central water-filled cavity (Figure <figr fid="F7">7A</figr>). Each subunit has a heme that binds oxygen. The oxygenation of Hb is cooperative, i.e. binding of a first O<sub>2 </sub>enhances the O<sub>2 </sub>affinity of the other subunits. Following the MWC model <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>, Hb exists in two conformations in rapid equilibrium: the T state with low affinity for oxygen binding and the R state with high affinity. The intrinsic oxygen binding affinity of the tetramer is nearly 300-fold lower than that of its free &#945;<sub>1</sub>&#946;<sub>1 </sub>dimer <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>, implying that the interactions at the dimer-dimer interface interfere with substrate binding unless relieved by a conformational switch.</p>
               <fig id="F7">
                  <title>
                     <p>Figure 7</p>
                  </title>
                  <caption>
                     <p>
                        <it>(A) Identification of unique <it>n</it>-grams in hemoglobin</it>
                     </p>
                  </caption>
                  <text>
                     <p><b><it>(A) Identification of unique <it>n</it>-grams in hemoglobin</it></b>. The deoxy hemoglobin structure (PDB:1A3N [60]) is shown in Panel A. The tetramer is composed of two &#945;-subunits, (white) two &#946;-subunits (wheat) and four heme groups (red). The most unique amino acids are colored magenta (&#945;-subunit) and cyan (&#946;-subunit). (<b>B, C</b>)<b> Scarcity scores for the respective &#945;- and &#946;-subunits</b>. The scores are based on 138 and 224 homologous sequences (E value &lt; 10<sup>-57</sup>) retrieved for a- and b-subunits, respectively. The residues colored in panel A have scarcity scores (based on 3-grams to which they belong; see eq 5) above the threshold indicated by the orange dashed line. <b>(D) PROSITE motifs</b>. Two histidines (magenta) are identified at the heme binding sites using PROSITE for each subunit.</p>
                  </text>
                  <graphic file="1471-2105-8-226-7"/>
               </fig>
               <p>The &lt;<inline-formula><m:math name="1471-2105-8-226-i9" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>s</m:mi><m:mi>j</m:mi><m:mrow><m:mi>u</m:mi><m:mi>n</m:mi><m:mi>i</m:mi></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGZbWCdaqhaaWcbaGaemOAaOgabaGaemyDauNaemOBa4MaemyAaKgaaaaa@33D8@</m:annotation></m:semantics></m:math></inline-formula>> profiles for subunits &#945; and &#946; are presented in Figure <figr fid="F7">7B</figr> and <figr fid="F7">7C</figr>, respectively. Five stretches of residues are distinguished by their high scarcity scores in subunit &#945; : &#945; A<sub>12</sub>AWGK<sub>16</sub>, &#945; R<sub>31</sub>MF<sub>33</sub>, &#945; Y<sub>42</sub>FPHF<sub>47</sub>, &#945; H<sub>87</sub>AH<sub>89</sub>, and &#945; S<sub>102</sub>HCL<sub>105</sub>, colored magenta in panel A. In subunit &#946;, &#946; W<sub>15</sub>GK<sub>17</sub>, &#946; Y<sub>35</sub>PWTQ<sub>39</sub>, &#946; L<sub>91</sub>HCD<sub>94 </sub>and &#946; H<sub>143</sub>KYH<sub>146 </sub>(cyan) emerge as rare sequences.</p>
               <p>Notably, all of these stretches of amino acids distinguished by their high scarcity score assume functional roles in Hb. &#945; R<sub>31</sub>MF<sub>35 </sub>and &#945; Y<sub>42</sub>FPHFD<sub>47 </sub>are directly involved in binding the heme group. The mutant R31S results in abnormal hemoglobin &#8211; Hb Prato <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. &#945; Tyr42 forms an inter-subunit hydrogen bond with &#946; Asp99 across the dimer-dimer interface in the T state of hemoglobin but not on the R state. Kavanaugh <it>et al </it><abbrgrp><abbr bid="B37">37</abbr></abbrgrp> showed that this hydrogen bond and other interactions associated with the side chain of &#945; Tyr42 make a major contribution to the stability of the T state. &#945; H<sub>87</sub>AH<sub>89 </sub>corresponds to the heme binding site, with &#945; His87 coordinating the Fe atom. The Fe-N&#949; bonds between the heme group and &#945; His87 or &#946; His92 are essential for the cooperativity of hemoglobin <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. &#945; His103 is located inside a central cavity lined with excess of positively charged ionizable groups. It has been suggested <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> and experimentally validated <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> that the mutual repulsion of these ionizable groups increases the oxygen affinity by raising the free energy of the T state. Using network analysis, del Sol <it>et al</it>. confirmed that &#945; His103 plays an important role in allosteric communication <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. We also note the critical position of &#945; His103 at the &#945;<sub>1</sub>&#946;<sub>1 </sub>interface with &#945; Ser102 pointing toward the heme binding pocket. Among the residues participating in rare sequences in subunit &#946;, in addition to the heme-binding &#946; H92, we note &#946; K17, the substitution of which by Asn has been reported to lead to abnormal Hb J Amiens <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>. The segment &#946; Y<sub>35</sub>PWTQ<sub>39 </sub>is located at the convergence of the &#945;<sub>1</sub>&#946;<sub>1 </sub>and &#945;<sub>2</sub>&#946;<sub>2 </sub>interfaces and is important for allostery. Mutations of &#946; 37W (e.g. &#946; W37E and &#946; W37G) have been shown to lead to tetrameric Hb with functional properties similar to those of the &#945;&#946; dimers; that is, high oxygen affinity with no cooperativity <abbrgrp><abbr bid="B42">42</abbr><abbr bid="B43">43</abbr></abbrgrp>. Finally, the <it>4</it>-gram &#946; H<sub>143</sub>KYH<sub>146 </sub>at the C-terminal end of subunit &#946; and in particular H<sub>143 </sub>is a binding site for allosteric effectors <abbrgrp><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr></abbrgrp>.</p>
               <p>In addition to the above experimental evidence, we note that using the Gaussian network model we have shown that two segments, &#945; F36-H45 and &#946; T87-&#946; N102, play a key role in the allosteric transition of Hb from constrained (T form) to flexible (or relaxed) (R2 form) state <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>; these two regions include the so-called switch region <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>. The first stretch partially overlaps with &#945; Y<sub>42</sub>FPHF<sub>46 </sub>and the second contains &#946; L<sub>91</sub>HCD<sub>94</sub>. These results lend further support to the critical role of the rare <it>n</it>-grams, and in particular &#946; His92, in propagating allosteric signals. Finally, we note that out of 37 rare residues identified in Hb, 14 are fully conserved (entropy S = 0), and 17 highly conserved (S &lt; 0.75; see Additional files Table S6 for more details).</p>
               <p>Upon scanning the hemoglobin sequence against the PROSITE database (Figure <figr fid="F7">7D</figr>) four histidines are retrieved, two (&#945; His58 and &#945; His87) on &#945;-subunits, and two (&#946; His63 and &#946; His92) on &#946;-subunits. As mentioned above, &#945; His87 and &#946; His92, also detected among the rare 3-grams, coordinate the Fe atom of the heme group, while &#945; His58 and &#946; His63 are located at the same region across the heme plane.</p>
            </sec>
            <sec>
               <st>
                  <p>Tyrosyl-tRNA synthetase</p>
               </st>
               <p>Tyrosyl-tRNA synthetase catalyzes the attachment of tyrosine to its cognate tRNA after activation of the tyrosine via formation of tyrosyl adenylate. Its structure has been solved in complexed form with an inhibitor (Figure <figr fid="F8">8A</figr>) that mimics the binding of the activated tyrosine (tyrosyl adenylate) in the active site <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>. The calculated &lt;<inline-formula><m:math name="1471-2105-8-226-i9" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>s</m:mi><m:mi>j</m:mi><m:mrow><m:mi>u</m:mi><m:mi>n</m:mi><m:mi>i</m:mi></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGZbWCdaqhaaWcbaGaemOAaOgabaGaemyDauNaemOBa4MaemyAaKgaaaaa@33D8@</m:annotation></m:semantics></m:math></inline-formula>> profile shown in Figure <figr fid="F8">8B</figr> reveals three <it>n</it>-grams distinguished by their uniqueness: Y<sub>124</sub>DWIG<sub>128</sub>, D<sub>194</sub>QWGN<sub>198</sub>, and Y<sub>252</sub>QFW<sub>255 </sub>(Figure <figr fid="F8">8A</figr>, magenta). The former two are located at the substrate binding site. The latter forms contacts with the conserved motif HIGH that comprises the catalytic His45 <abbrgrp><abbr bid="B49">49</abbr></abbrgrp> and with the amino acids F231, G232, and T234 of the K<sub>230</sub>FGKT<sub>234 </sub>mobile loop (Figure <figr fid="F8">8A</figr>, cyan). This loop was found to stabilize the transition state in the activation reaction of tyrosyl by ATP <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>. We note that <b>W126</b>, <b>D194, Q195</b>, W196, G197, <b>N198</b>, Y252 and <b>Q253 </b>are fully (bold) or highly conserved. See Table S7 in the Additional files for more details. PROSITE scanning reveals the P<sub>39</sub>TADSLHIGHL<sub>49 </sub>stretch of amino acids at the active site, which includes the catalytic His45 (Figure <figr fid="F8">8C</figr>).</p>
               <fig id="F8">
                  <title>
                     <p>Figure 8</p>
                  </title>
                  <caption>
                     <p>
                        <it>(A) Identification of tyrosyl tRNA synthetase unique n-grams</it>
                     </p>
                  </caption>
                  <text>
                     <p><b><it>(A) Identification of tyrosyl tRNA synthetase unique n-grams</it></b>. Tyrosyl -tRNA synthetase (PDB:3TS1 [48]) is shown (white) in complex with tyrosyl adenylate intermediate (orange). The pair H<sub>45</sub>I<sub>46 </sub>(cyan) of the catalytic His, the residues K<sub>230</sub>FGKT<sub>234 </sub>of the mobile loop (cyan) and the three most unique <it>n</it>-grams (magenta) are shown as sticks.<b>(B) Scarcity score </b>of amino acids. 24 homologous (E value &lt; 10<sup>-57</sup>) of <it>Bacillus stearothermophilus </it>were used to calculate the average uniqueness score. Three unique <it>n</it>-grams (filled circles) are distinguished. <b>(C) PROSITE motifs</b>. The loop containing the catalytic His at the binding site (magenta) is identified using PROSITE.</p>
                  </text>
                  <graphic file="1471-2105-8-226-8"/>
               </fig>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Results in the absence of sequence alignment</p>
            </st>
            <p>In the above three examples the scarcity score was calculated by averaging <it>p</it><sup><it>uni</it></sup>|<sub><it>jk </it></sub>over a set of homologous proteins. However, in principle, scarcity score calculations can be equally performed for a single sequence as the ingredient is the UniProt frequencies for triplets along the sequence. The alignments are simply used to magnify (increase the accuracy of) the signals that could, otherwise, be extracted from single sequence analysis. As shown in the figures S1&#8211;S3 in the Additional files, the <inline-formula><m:math name="1471-2105-8-226-i9" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>s</m:mi><m:mi>j</m:mi><m:mrow><m:mi>u</m:mi><m:mi>n</m:mi><m:mi>i</m:mi></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGZbWCdaqhaaWcbaGaemOAaOgabaGaemyDauNaemOBa4MaemyAaKgaaaaa@33D8@</m:annotation></m:semantics></m:math></inline-formula> profiles based on single sequences closely approximate those derived from multiple sequence alignments. For hemoglobin subunit &#945;, for example, six stretches of amino acids are distinguished by their high scarcity scores: W<sub>14</sub>GK<sub>16</sub>, R<sub>31</sub>MF<sub>33</sub>, F<sub>43</sub>PHF<sub>46</sub>, D<sub>75</sub>MPN<sub>78</sub>, H<sub>87</sub>AH<sub>89</sub>, S<sub>102</sub>HCL<sub>105</sub>, in accord with the results presented above. The only stretch of amino acids that differs from the above analysis is D<sub>75</sub>MPN<sub>78</sub>. Thus, when sequence conservation algorithms fail, the <it>3</it>-gram analysis of single chains can be advantageously resorted to for identifying potentially functional sites.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>The present study shows that relatively rare sequence motifs are preferentially recruited at enzymes' active sites. The probabilistic occurrence of rare <it>3</it>-grams at active sites is higher than that of other 3-grams by a factor of more than 2.6 and higher that of exposed <it>3</it>-grams by a factor of 5.4 according to the results obtained for the GT dataset of 59 enzymes (Figure <figr fid="F4">4</figr>). Detailed analysis of each enzyme in this dataset demonstrated that a scarcity score threshold of 11.0, permits us to identify 3-grams located at/near active sites, with a TP rate more than three times larger than the FP rate. These results were obtained from the sequence of the individual enzymes without recourse to sequence alignment or any amino acid conservation information.</p>
         <p>The preferential selection of rare <it>3</it>-grams at functional sites may not be a property of enzymes, exclusively, but that of proteins, in general, as the current application to hemoglobin suggests. We have illustrated and discussed here the functional importance of rare <it>3</it>-gram motifs for three proteins: c-Src, hemoglobin and tyrosyl-tRNA synthetase.</p>
         <p>As a further assessment of the functional importance of rare <it>n</it>-grams, we examined the conservation of these <it>n</it>-grams among the members of the three families of proteins on which we focused. Figure <figr fid="F9">9</figr> displays the results, reported in terms of the Shannon entropies <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>, for two groups of residues: (i) the rare residues distinguished by their high scarcity scores (above the threshold values indicated in Figures <figr fid="F6">6B</figr>, <figr fid="F7">7B, C</figr> and <figr fid="F8">8B</figr>), and (ii) all other residues. The respective numbers of residues in the two sets are 83 and 1240. The histograms (percentages) are displayed for bins of size 0.25. We note that unique residues exhibit a stronger tendency to be conserved (despite their low counts in the UniProt) among the members of the examined families, lending further support to their functional importance (see Additional files for details).</p>
         <fig id="F9">
            <title>
               <p>Figure 9</p>
            </title>
            <caption>
               <p>
                  <it>Histogram of Shannon entropies</it>
               </p>
            </caption>
            <text>
               <p><b><it>Histogram of Shannon entropies</it></b>. for amino acids participating in rare 3-grams (black) and other amino acids (dashed), based on the family memberships in the examined three cases (Src, Hb and Tyr tRNA synthetase). The abscissa shows the percentages of amino acids corresponding to bins of size &#916; S = 0.25. We note the high propensity of rare residues in the range S &lt; 0.25 corresponding to highest conservation.</p>
            </text>
            <graphic file="1471-2105-8-226-9"/>
         </fig>
         <p>While our analysis is purely <it>sequence-based</it>, it appears to be conceptually in accord with two <it>structure-based </it>studies that indicate that rare structural motifs correlate with proteins functional sites. Petock <it>et al </it><abbrgrp><abbr bid="B22">22</abbr></abbrgrp> showed that 9-residue fragments of rare backbone conformations often form parts of ligand-binding sites, protein-protein interactions, and domain-domain contacts. Likewise, Novotny and Kleywegt <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> found that, even thought left-handed helices are rare, when they do occur, they are structurally or functionally significant. The present analysis shows that not only structural motifs, but sequence motifs as well, tend to be highly distinctive at active sites. This selection of rare sequence motifs seems to be driven by specificity requirements. Notably, the rarest amino acids (Cys, His, Trp and Met; Figure <figr fid="F5">5B</figr>) have unique functional properties: Cys and Met are sulfur-containing amino acids and are sensitive to oxidation. Likewise, His is unique as its protonation depends on pH, and Trp is distinguished by its bulky size and significant contribution to stabilization of hydrophobic cores or interfaces. It has been suggested that proteins may even sustain a decrease in their stability in order to form effective functional sites <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>. In accord with the above notion, these highly specific residues, Trp, His, Met, and Cys, may be used by enzymes for precise functional purposes, their specificity being amplified in <it>3</it>-grams where two or more of these amino acids are juxtaposed.</p>
         <p>Identification of proteins' functional sites using computational tools is a major challenge in computational biology in the post-genomic era <abbrgrp><abbr bid="B52">52</abbr><abbr bid="B53">53</abbr></abbrgrp>. Sequence and structure conservation are the major criteria used by current algorithms to identify functional important sites. Identification of functional sites for orphan proteins (for which no homologous protein exists) is beyond the scope of algorithms that use conservation as major criterion for identifying these sites. Notably, we repeated the scarcity score analysis by using only a single chain and we were still able to identify functionally important sites (see Additional files), demonstrating that the potentially functional sites of orphan proteins can also be assessed by our methodology, and that conservation and scarcity are two different features. We note that the average scarcity score of scarce <it>and </it>conserved residues will be higher than that of scarce and non-conserved residues, as the scarcity score is evaluated as an average over all aligned sequences. However, calculations repeated for single query sequences, as opposed to those averaged over multiple aligned sequences showed that such effects are negligibly small and the profile of scarcity scores for examined proteins are robust. In addition, we have compared the scarcity scores (based on UniProt frequencies) and the Shannon entropies at the corresponding amino acid positions. Computations for the three test cases revealed that residues participating in rare 3-grams (Tables S5&#8211;S7) are not necessarily conserved or <it>vice versa</it>. However, they do exhibit a higher tendency to be conserved compared to other amino acids, as illustrated in Figure <figr fid="F9">9</figr>. Therefore, it is conceivable that a combined approach exploiting both conservation and scarcity may enhance the score/signal associated with particular sites, thus increasing the ability to accurately identify functional motifs.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Datasets</p>
            </st>
            <p>Present calculations are based on the data extracted from UniProt <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> version 4.2. This database was chosen because it is manually curated with minimal level of redundancy and still comprehensive. Furthermore, UniProt contains entries from multiple organisms and it is therefore less susceptible to biases in <it>3</it>-gram distributions of specific species as noted by Ganpathiraju et al <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>.</p>
            <p>The biological implications of the <it>3</it>-grams distinguished by high scarcity scores were examined using the dataset of 59 enzyme/ligand complexes compiled by Gutteridge and Thornton <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> (GT dataset; see Table <tblr tid="T1">1</tblr>). We performed three case studies: C-Src kinase, hemoglobin subunits &#945; and &#946;, and tyrosyl-tRNA synthetase. Homologous sequences were retrieved in each case by a PBLAST search <abbrgrp><abbr bid="B55">55</abbr></abbrgrp> of the SwissProt database. Sequences with <it>E</it>-values lower than 10<sup>-57 </sup>were extracted, leading to <it>m </it>= 111 homologous sequences for c-Src, 138 and 224 sequences for hemoglobin &#945;- and &#946;-subunits, respectively, and 24 sequences for tyrosyl tRNA synthetase. The sequences were aligned with ClustalX using default parameters <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Exposed <it>3</it>-grams</p>
            </st>
            <p>The Accessible surface area (ASA) of each amino acid in the GT database was calculated using the program NACCESS <abbrgrp><abbr bid="B57">57</abbr></abbrgrp>. Amino acids were defined as exposed if their relative exposed surface are was > 20%. <it>3</it>-grams with two or more exposed amino acids were defined as exposed ones.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>DT did the calculations. DT and IB analyzed the results and wrote the manuscript. Both authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank Thomas E. Smithgall for helpful discussions on the structure-function of Src proteins family. Support by NIH R01 grant # LM007994 is gratefully acknowledged.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>A method to predict functional residues in proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Casari</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Sander</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Valencia</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Nat Struct Biol</source>
            <pubdate>1995</pubdate>
            <volume>2</volume>
            <fpage>171</fpage>
            <lpage>178</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nsb0295-171</pubid>
                  <pubid idtype="pmpid">7749921</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>An evolutionary trace method defines binding surfaces common to protein families</p>
            </title>
            <aug>
               <au>
                  <snm>Lichtarge</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Bourne</snm>
                  <fnm>HR</fnm>
               </au>
               <au>
                  <snm>Cohen</snm>
                  <fnm>FE</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1996</pubdate>
            <volume>257</volume>
            <fpage>342</fpage>
            <lpage>358</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1996.0167</pubid>
                  <pubid idtype="pmpid" link="fulltext">8609628</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Analysis and prediction of functional sub-types from protein sequence alignments</p>
            </title>
            <aug>
               <au>
                  <snm>Hannenhalli</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Russell</snm>
                  <fnm>RB</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2000</pubdate>
            <volume>303</volume>
            <fpage>61</fpage>
            <lpage>76</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2000.4036</pubid>
                  <pubid idtype="pmpid" link="fulltext">11021970</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information</p>
            </title>
            <aug>
               <au>
                  <snm>Armon</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Graur</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ben-Tal</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2001</pubdate>
            <volume>307</volume>
            <fpage>447</fpage>
            <lpage>463</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2000.4474</pubid>
                  <pubid idtype="pmpid" link="fulltext">11243830</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues</p>
            </title>
            <aug>
               <au>
                  <snm>Pupko</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Bell</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>Mayrose</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Glaser</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Ben-Tal</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18 Suppl 1</volume>
            <fpage>S71</fpage>
            <lpage>S77</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12169533</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Classification of protein families and detection of the determinant residues with an improved self-organizing map</p>
            </title>
            <aug>
               <au>
                  <snm>Andrade</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Casari</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Sander</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Valencia</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Biol Cybern</source>
            <pubdate>1997</pubdate>
            <volume>76</volume>
            <fpage>441</fpage>
            <lpage>450</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s004220050357</pubid>
                  <pubid idtype="pmpid" link="fulltext">9263431</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Automated structure-based prediction of functional sites in proteins: applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking</p>
            </title>
            <aug>
               <au>
                  <snm>Aloy</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Querol</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Aviles</snm>
                  <fnm>FX</fnm>
               </au>
               <au>
                  <snm>Sternberg</snm>
                  <fnm>MJ</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2001</pubdate>
            <volume>311</volume>
            <fpage>395</fpage>
            <lpage>408</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2001.4870</pubid>
                  <pubid idtype="pmpid" link="fulltext">11478868</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Structural clusters of evolutionary trace residues are statistically significant and common in proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Madabushi</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Yao</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Marsh</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kristensen</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Philippi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sowa</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Lichtarge</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2002</pubdate>
            <volume>316</volume>
            <fpage>139</fpage>
            <lpage>154</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2001.5327</pubid>
                  <pubid idtype="pmpid" link="fulltext">11829509</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Using a neural network and spatial clustering to predict the location of active sites in enzymes</p>
            </title>
            <aug>
               <au>
                  <snm>Gutteridge</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bartlett</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2003</pubdate>
            <volume>330</volume>
            <fpage>719</fpage>
            <lpage>734</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0022-2836(03)00515-1</pubid>
                  <pubid idtype="pmpid" link="fulltext">12850142</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Prediction of functional sites by analysis of sequence and structure conservation</p>
            </title>
            <aug>
               <au>
                  <snm>Panchenko</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Kondrashov</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Bryant</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Protein Sci</source>
            <pubdate>2004</pubdate>
            <volume>13</volume>
            <fpage>884</fpage>
            <lpage>892</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1110/ps.03465504</pubid>
                  <pubid idtype="pmpid" link="fulltext">15010543</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Landgraf</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Xenarios</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2001</pubdate>
            <volume>307</volume>
            <fpage>1487</fpage>
            <lpage>1502</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2001.4540</pubid>
                  <pubid idtype="pmpid" link="fulltext">11292355</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>TESS: a geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. Application to enzyme active sites</p>
            </title>
            <aug>
               <au>
                  <snm>Wallace</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Borkakoti</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Protein Sci</source>
            <pubdate>1997</pubdate>
            <volume>6</volume>
            <fpage>2308</fpage>
            <lpage>2323</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9385633</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Functional analysis of the Escherichia coli genome using the sequence-to-structure-to-function paradigm: identification of proteins exhibiting the glutaredoxin/thioredoxin disulfide oxidoreductase activity</p>
            </title>
            <aug>
               <au>
                  <snm>Fetrow</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Godzik</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Skolnick</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1998</pubdate>
            <volume>282</volume>
            <fpage>703</fpage>
            <lpage>711</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1998.2061</pubid>
                  <pubid idtype="pmpid" link="fulltext">9743619</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Recognition of spatial motifs in protein structures</p>
            </title>
            <aug>
               <au>
                  <snm>Kleywegt</snm>
                  <fnm>GJ</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1999</pubdate>
            <volume>285</volume>
            <fpage>1887</fpage>
            <lpage>1897</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1998.2393</pubid>
                  <pubid idtype="pmpid" link="fulltext">9917419</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Mining frequent patterns in protein structures: a study of protease families</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>SC</fnm>
               </au>
               <au>
                  <snm>Bahar</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20 Suppl 1</volume>
            <fpage>I77</fpage>
            <lpage>I85</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1201446</pubid>
                  <pubid idtype="pmpid" link="fulltext">15262784</pubid>
                  <pubid idtype="doi">10.1093/bioinformatics/bth912</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Coupling between catalytic site and collective dynamics: a requirement for mechanochemical activity of enzymes</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>LW</fnm>
               </au>
               <au>
                  <snm>Bahar</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Structure</source>
            <pubdate>2005</pubdate>
            <volume>13</volume>
            <fpage>893</fpage>
            <lpage>904</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1489920</pubid>
                  <pubid idtype="pmpid" link="fulltext">15939021</pubid>
                  <pubid idtype="doi">10.1016/j.str.2005.03.015</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Residues crucial for maintaining short paths in network communication mediate signaling in proteins</p>
            </title>
            <aug>
               <au>
                  <snm>del Sol</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Fujihashi</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Amoros</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Nussinov</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Mol Syst Biol</source>
            <pubdate>2006</pubdate>
            <volume>2</volume>
            <fpage>2006</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1681495</pubid>
                  <pubid idtype="pmpid" link="fulltext">16738564</pubid>
                  <pubid idtype="doi">10.1038/msb4100063</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Markov propagation of allosteric effects in biomolecular systems: application to GroEL-GroES</p>
            </title>
            <aug>
               <au>
                  <snm>Chennubhotla</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Bahar</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Mol Syst Biol</source>
            <pubdate>2006</pubdate>
            <volume>2</volume>
            <fpage>36</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1681507</pubid>
                  <pubid idtype="pmpid" link="fulltext">16820777</pubid>
                  <pubid idtype="doi">10.1038/msb4100075</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>A sequence alignment-independent method for protein classification</p>
            </title>
            <aug>
               <au>
                  <snm>Vries</snm>
                  <fnm>JK</fnm>
               </au>
               <au>
                  <snm>Munshi</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Tobi</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Klein-Seetharaman</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Benos</snm>
                  <fnm>PV</fnm>
               </au>
               <au>
                  <snm>Bahar</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Appl Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>3</volume>
            <fpage>137</fpage>
            <lpage>148</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.2165/00822942-200403020-00008</pubid>
                  <pubid idtype="pmpid">15693739</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>The Universal Protein Resource (UniProt)</p>
            </title>
            <aug>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>CH</fnm>
               </au>
               <au>
                  <snm>Barker</snm>
                  <fnm>WC</fnm>
               </au>
               <au>
                  <snm>Boeckmann</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Ferro</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gasteiger</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Lopez</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Magrane</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Martin</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Natale</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>O'Donovan</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Redaschi</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Yeh</snm>
                  <fnm>LS</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>D154</fpage>
            <lpage>D159</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">540024</pubid>
                  <pubid idtype="pmpid" link="fulltext">15608167</pubid>
                  <pubid idtype="doi">10.1093/nar/gki070</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Analysis of the steric strain in the polypeptide backbone of protein molecules</p>
            </title>
            <aug>
               <au>
                  <snm>Herzberg</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Moult</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>1991</pubdate>
            <volume>11</volume>
            <fpage>223</fpage>
            <lpage>229</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.340110307</pubid>
                  <pubid idtype="pmpid">1749775</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Analysis of protein structures reveals regions of rare backbone conformation at functional sites</p>
            </title>
            <aug>
               <au>
                  <snm>Petock</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Torshin</snm>
                  <fnm>IY</fnm>
               </au>
               <au>
                  <snm>Weber</snm>
                  <fnm>IT</fnm>
               </au>
               <au>
                  <snm>Harrison</snm>
                  <fnm>RW</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2003</pubdate>
            <volume>53</volume>
            <fpage>872</fpage>
            <lpage>879</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.10484</pubid>
                  <pubid idtype="pmpid" link="fulltext">14635129</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>A survey of left-handed helices in protein structures</p>
            </title>
            <aug>
               <au>
                  <snm>Novotny</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kleywegt</snm>
                  <fnm>GJ</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2005</pubdate>
            <volume>347</volume>
            <fpage>231</fpage>
            <lpage>241</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2005.01.037</pubid>
                  <pubid idtype="pmpid" link="fulltext">15740737</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Conformational changes observed in enzyme crystal structures upon substrate binding</p>
            </title>
            <aug>
               <au>
                  <snm>Gutteridge</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2005</pubdate>
            <volume>346</volume>
            <fpage>21</fpage>
            <lpage>28</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2004.11.013</pubid>
                  <pubid idtype="pmpid" link="fulltext">15663924</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Analysis of catalytic residues in enzyme active sites</p>
            </title>
            <aug>
               <au>
                  <snm>Bartlett</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Porter</snm>
                  <fnm>CT</fnm>
               </au>
               <au>
                  <snm>Borkakoti</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2002</pubdate>
            <volume>324</volume>
            <fpage>105</fpage>
            <lpage>121</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0022-2836(02)01036-7</pubid>
                  <pubid idtype="pmpid" link="fulltext">12421562</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Regulation, substrates and functions of src</p>
            </title>
            <aug>
               <au>
                  <snm>Brown</snm>
                  <fnm>MT</fnm>
               </au>
               <au>
                  <snm>Cooper</snm>
                  <fnm>JA</fnm>
               </au>
            </aug>
            <source>Biochim Biophys Acta</source>
            <pubdate>1996</pubdate>
            <volume>1287</volume>
            <fpage>121</fpage>
            <lpage>149</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8672527</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Src protein-tyrosine kinase structure and regulation</p>
            </title>
            <aug>
               <au>
                  <snm>Roskoski</snm>
                  <fnm>R</fnm>
                  <suf>Jr.</suf>
               </au>
            </aug>
            <source>Biochem Biophys Res Commun</source>
            <pubdate>2004</pubdate>
            <volume>324</volume>
            <fpage>1155</fpage>
            <lpage>1164</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.bbrc.2004.09.171</pubid>
                  <pubid idtype="pmpid" link="fulltext">15504335</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>SH3-dependent stimulation of Src-family kinase autophosphorylation without tail release from the SH2 domain in vivo</p>
            </title>
            <aug>
               <au>
                  <snm>Lerner</snm>
                  <fnm>EC</fnm>
               </au>
               <au>
                  <snm>Smithgall</snm>
                  <fnm>TE</fnm>
               </au>
            </aug>
            <source>Nat Struct Biol</source>
            <pubdate>2002</pubdate>
            <volume>9</volume>
            <fpage>365</fpage>
            <lpage>369</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11976726</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Specific interactions outside the proline-rich core of two classes of Src homology 3 ligands</p>
            </title>
            <aug>
               <au>
                  <snm>Feng</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kasahara</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Rickles</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Schreiber</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>1995</pubdate>
            <volume>92</volume>
            <fpage>12408</fpage>
            <lpage>12415</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">40367</pubid>
                  <pubid idtype="pmpid" link="fulltext">8618911</pubid>
                  <pubid idtype="doi">10.1073/pnas.92.26.12408</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Differential ligand recognition by the Src and phosphatidylinositol 3-kinase Src homology 3 domains: circular dichroism and ultraviolet resonance Raman studies</p>
            </title>
            <aug>
               <au>
                  <snm>Okishio</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Tanaka</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Fukuda</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Nagai</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Biochemistry</source>
            <pubdate>2003</pubdate>
            <volume>42</volume>
            <fpage>208</fpage>
            <lpage>216</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/bi020475y</pubid>
                  <pubid idtype="pmpid" link="fulltext">12515556</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Dynamic coupling between the SH2 and SH3 domains of c-Src and Hck underlies their inactivation by C-terminal tyrosine phosphorylation</p>
            </title>
            <aug>
               <au>
                  <snm>Young</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Gonfloni</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Superti-Furga</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Roux</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Kuriyan</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2001</pubdate>
            <volume>105</volume>
            <fpage>115</fpage>
            <lpage>126</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0092-8674(01)00301-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">11301007</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Src kinase activation: A switched electrostatic network</p>
            </title>
            <aug>
               <au>
                  <snm>Ozkirimli</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Post</snm>
                  <fnm>CB</fnm>
               </au>
            </aug>
            <source>Protein Sci</source>
            <pubdate>2006</pubdate>
            <volume>15</volume>
            <fpage>1051</fpage>
            <lpage>1062</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1110/ps.051999206</pubid>
                  <pubid idtype="pmpid" link="fulltext">16597828</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>The PROSITE database</p>
            </title>
            <aug>
               <au>
                  <snm>Hulo</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bulliard</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Cerutti</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>de</snm>
                  <fnm>CE</fnm>
               </au>
               <au>
                  <snm>Langendijk-Genevaux</snm>
                  <fnm>PS</fnm>
               </au>
               <au>
                  <snm>Pagni</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sigrist</snm>
                  <fnm>CJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>D227</fpage>
            <lpage>D230</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1347426</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381852</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj063</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Allosteric mechanisms of signal transduction</p>
            </title>
            <aug>
               <au>
                  <snm>Changeux</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Edelstein</snm>
                  <fnm>SJ</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2005</pubdate>
            <volume>308</volume>
            <fpage>1424</fpage>
            <lpage>1428</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1108595</pubid>
                  <pubid idtype="pmpid" link="fulltext">15933191</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Oxygenation-linked subunit interactions in human hemoglobin: experimental studies on the concentration dependence of oxygenation curves</p>
            </title>
            <aug>
               <au>
                  <snm>Mills</snm>
                  <fnm>FC</fnm>
               </au>
               <au>
                  <snm>Johnson</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Ackers</snm>
                  <fnm>GK</fnm>
               </au>
            </aug>
            <source>Biochemistry</source>
            <pubdate>1976</pubdate>
            <volume>15</volume>
            <fpage>5350</fpage>
            <lpage>5362</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/bi00669a023</pubid>
                  <pubid idtype="pmpid">999811</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>A new abnormal human hemoglobin: Hb Prato (alpha 2 31 (B12) Arg leads to Ser beta 2)</p>
            </title>
            <aug>
               <au>
                  <snm>Marinucci</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Mavilio</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Massa</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gabbianelli</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Fontanarosa</snm>
                  <fnm>PP</fnm>
               </au>
               <au>
                  <snm>Camagna</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ignesti</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Tentori</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Biochim Biophys Acta</source>
            <pubdate>1979</pubdate>
            <volume>578</volume>
            <fpage>534</fpage>
            <lpage>540</lpage>
            <xrefbib>
               <pubid idtype="pmpid">486536</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Intersubunit interactions associated with Tyr42 alpha stabilize the quaternary-T tetramer but are not major quaternary constraints in deoxyhemoglobin</p>
            </title>
            <aug>
               <au>
                  <snm>Kavanaugh</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Rogers</snm>
                  <fnm>PH</fnm>
               </au>
               <au>
                  <snm>Arnone</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hui</snm>
                  <fnm>HL</fnm>
               </au>
               <au>
                  <snm>Wierzba</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>DeYoung</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kwiatkowski</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Noble</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Juszczak</snm>
                  <fnm>LJ</fnm>
               </au>
               <au>
                  <snm>Peterson</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Friedman</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Biochemistry</source>
            <pubdate>2005</pubdate>
            <volume>44</volume>
            <fpage>3806</fpage>
            <lpage>3820</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/bi0484670</pubid>
                  <pubid idtype="pmpid" link="fulltext">15751957</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>A test of the role of the proximal histidines in the Perutz model for cooperativity in haemoglobin</p>
            </title>
            <aug>
               <au>
                  <snm>Barrick</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ho</snm>
                  <fnm>NT</fnm>
               </au>
               <au>
                  <snm>Simplaceanu</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Dahlquist</snm>
                  <fnm>FW</fnm>
               </au>
               <au>
                  <snm>Ho</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Nat Struct Biol</source>
            <pubdate>1997</pubdate>
            <volume>4</volume>
            <fpage>78</fpage>
            <lpage>83</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nsb0197-78</pubid>
                  <pubid idtype="pmpid">8989328</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Anionic control of hemoglobin function</p>
            </title>
            <aug>
               <au>
                  <snm>Bonaventura</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Bonaventura</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Biochemical and Clinical Aspects of Hemoglobin Abnormalities</source>
            <publisher>New York: Academic</publisher>
            <editor>Caughey WS</editor>
            <pubdate>1978</pubdate>
            <fpage>647</fpage>
            <lpage>663</lpage>
         </bibl>
         <bibl id="B40">
            <title>
               <p>The stereochemical mechanism of the cooperative effects in hemoglobin revisited</p>
            </title>
            <aug>
               <au>
                  <snm>Perutz</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Wilkinson</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Paoli</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Dodson</snm>
                  <fnm>GG</fnm>
               </au>
            </aug>
            <source>Annu Rev Biophys Biomol Struct</source>
            <pubdate>1998</pubdate>
            <volume>27</volume>
            <fpage>1</fpage>
            <lpage>34</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.biophys.27.1.1</pubid>
                  <pubid idtype="pmpid" link="fulltext">9646860</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>[Hemoglobin J Amiens beta 17 (A 14) Lys replaced by Asn. Coincidence of a functionally silent new abnormal hemoglobin and a polycythemia vera (author's transl)]</p>
            </title>
            <aug>
               <au>
                  <snm>Elion</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wajcman</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Belkhodja-Dunda</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Lapoumeroulie</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Labie</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Messerschmitt</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Staal</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Desablens</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Nouv Rev Fr Hematol</source>
            <pubdate>1979</pubdate>
            <volume>21</volume>
            <fpage>347</fpage>
            <lpage>352</lpage>
            <xrefbib>
               <pubid idtype="pmpid">121938</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Thermodynamic studies on the equilibrium properties of a series of recombinant betaW37 hemoglobin mutants</p>
            </title>
            <aug>
               <au>
                  <snm>Kiger</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Klinger</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Kwiatkowski</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>De</snm>
                  <fnm>YA</fnm>
               </au>
               <au>
                  <snm>Doyle</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Holt</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Noble</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Ackers</snm>
                  <fnm>GK</fnm>
               </au>
            </aug>
            <source>Biochemistry</source>
            <pubdate>1998</pubdate>
            <volume>37</volume>
            <fpage>4336</fpage>
            <lpage>4345</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/bi970868a</pubid>
                  <pubid idtype="pmpid" link="fulltext">9521754</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Site-directed mutations of human hemoglobin at residue 35beta: a residue at the intersection of the alpha1beta1, alpha1beta2, and alpha1alpha2 interfaces</p>
            </title>
            <aug>
               <au>
                  <snm>Kavanaugh</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Weydert</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Rogers</snm>
                  <fnm>PH</fnm>
               </au>
               <au>
                  <snm>Arnone</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hui</snm>
                  <fnm>HL</fnm>
               </au>
               <au>
                  <snm>Wierzba</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Kwiatkowski</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Paily</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Noble</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Bruno</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Mozzarelli</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Protein Sci</source>
            <pubdate>2001</pubdate>
            <volume>10</volume>
            <fpage>1847</fpage>
            <lpage>1855</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1110/ps.16401</pubid>
                  <pubid idtype="pmpid" link="fulltext">11514675</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Human deoxyhaemoglobin-2,3-diphosphoglycerate complex low-salt structure at 2.5 A resolution</p>
            </title>
            <aug>
               <au>
                  <snm>Richard</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Dodson</snm>
                  <fnm>GG</fnm>
               </au>
               <au>
                  <snm>Mauguen</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1993</pubdate>
            <volume>233</volume>
            <fpage>270</fpage>
            <lpage>274</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1993.1505</pubid>
                  <pubid idtype="pmpid" link="fulltext">8377203</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Metal complexes as allosteric effectors of human hemoglobin: an NMR study of the interaction of the gadolinium(III) bis(m-boroxyphenylamide)diethylenetriaminepentaacetic acid complex with human oxygenated and deoxygenated hemoglobin</p>
            </title>
            <aug>
               <au>
                  <snm>Aime</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Digilio</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Fasano</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Paoletti</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Arnelli</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ascenzi</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Biophys J</source>
            <pubdate>1999</pubdate>
            <volume>76</volume>
            <fpage>2735</fpage>
            <lpage>2743</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1300243</pubid>
                  <pubid idtype="pmpid" link="fulltext">10233088</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Allosteric changes in protein structure computed by a simple mechanical model: hemoglobin T&lt;-->R2 transition</p>
            </title>
            <aug>
               <au>
                  <snm>Xu</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Tobi</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Bahar</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2003</pubdate>
            <volume>333</volume>
            <fpage>153</fpage>
            <lpage>168</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2003.08.027</pubid>
                  <pubid idtype="pmpid" link="fulltext">14516750</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Haemoglobin: the structural changes related to ligand binding and its allosteric mechanism</p>
            </title>
            <aug>
               <au>
                  <snm>Baldwin</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Chothia</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1979</pubdate>
            <volume>129</volume>
            <fpage>175</fpage>
            <lpage>220</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0022-2836(79)90277-8</pubid>
                  <pubid idtype="pmpid">39173</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Structure of tyrosyl-tRNA synthetase refined at 2.3 A resolution. Interaction of the enzyme with the tyrosyl adenylate intermediate</p>
            </title>
            <aug>
               <au>
                  <snm>Brick</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bhat</snm>
                  <fnm>TN</fnm>
               </au>
               <au>
                  <snm>Blow</snm>
                  <fnm>DM</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1989</pubdate>
            <volume>208</volume>
            <fpage>83</fpage>
            <lpage>98</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0022-2836(89)90090-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">2504923</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Analysis of Enzyme Structure and activity by protein engineering</p>
            </title>
            <aug>
               <au>
                  <snm>Fersht</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Shi</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Wilkinson</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Blow</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Carter</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Waye</snm>
                  <fnm>MMY</fnm>
               </au>
               <au>
                  <snm>Winter</snm>
                  <fnm>GP</fnm>
               </au>
            </aug>
            <source>Angewandte Chemie</source>
            <pubdate>1984</pubdate>
            <volume>23</volume>
            <fpage>235</fpage>
            <lpage>238</lpage>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Analysis of the role of the KMSKS loop in the catalytic mechanism of the tyrosyl-tRNA synthetase using multimutant cycles</p>
            </title>
            <aug>
               <au>
                  <snm>First</snm>
                  <fnm>EA</fnm>
               </au>
               <au>
                  <snm>Fersht</snm>
                  <fnm>AR</fnm>
               </au>
            </aug>
            <source>Biochemistry</source>
            <pubdate>1995</pubdate>
            <volume>34</volume>
            <fpage>5030</fpage>
            <lpage>5043</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/bi00015a014</pubid>
                  <pubid idtype="pmpid">7711024</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>The mathematical theory of communication</p>
            </title>
            <aug>
               <au>
                  <snm>Shannon</snm>
                  <fnm>CE</fnm>
               </au>
            </aug>
            <source>The Bell system Technical Journal</source>
            <pubdate>1948</pubdate>
            <volume>27</volume>
            <fpage>379&#8211;423 and 623&#8211;656</fpage>
         </bibl>
         <bibl id="B52">
            <title>
               <p>The sequence of the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Venter</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Adams</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>PW</fnm>
               </au>
               <au>
                  <snm>Mural</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>GG</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>HO</fnm>
               </au>
               <au>
                  <snm>Yandell</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Evans</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Holt</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Gocayne</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Amanatides</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Ballew</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Huson</snm>
                  <fnm>DH</fnm>
               </au>
               <au>
                  <snm>Wortman</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Kodira</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Zheng</snm>
                  <fnm>XH</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Skupski</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Subramanian</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Thomas</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gabor Miklos</snm>
                  <fnm>GL</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Broder</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Nadeau</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>McKusick</snm>
                  <fnm>VA</fnm>
               </au>
               <au>
                  <snm>Zinder</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Levine</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Roberts</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Simon</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Slayman</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Hunkapiller</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bolanos</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Delcher</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Dew</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Fasulo</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Flanigan</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Florea</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Halpern</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hannenhalli</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kravitz</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Levy</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Mobarry</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Reinert</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Remington</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>bu-Threideh</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Beasley</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Biddick</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Bonazzi</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Brandon</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Cargill</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Chandramouliswaran</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Charlab</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Chaturvedi</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Deng</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Di</snm>
                  <fnm>F</fnm>
                  <suf>V</suf>
               </au>
               <au>
                  <snm>Dunn</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Eilbeck</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Evangelista</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Gabrielian</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Gan</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Ge</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Gong</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Gu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Guan</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Heiman</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Ji</snm>
                  <fnm>RR</fnm>
               </au>
               <au>
                  <snm>Ke</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Ketchum</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Lai</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Lei</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Liang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Merkulov</snm>
                  <fnm>GV</fnm>
               </au>
               <au>
                  <snm>Milshina</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Moore</snm>
                  <fnm>HM</fnm>
               </au>
               <au>
                  <snm>Naik</snm>
                  <fnm>AK</fnm>
               </au>
               <au>
                  <snm>Narayan</snm>
                  <fnm>VA</fnm>
               </au>
               <au>
                  <snm>Neelam</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Nusskern</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Rusch</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Shao</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Shue</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wei</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wides</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Xiao</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Yan</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Yao</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ye</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhan</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Zhao</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Zheng</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Zhong</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Zhong</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Zhao</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gilbert</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Baumhueter</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Spier</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Carter</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Cravchik</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Woodage</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Ali</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>An</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Awe</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Baldwin</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Baden</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Barnstead</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Barrow</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Beeson</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Busam</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Carver</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Center</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Cheng</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Curry</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Danaher</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Davenport</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Desilets</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Dietz</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Dodson</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Doup</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Ferriera</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Garg</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Gluecksmann</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hart</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Haynes</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Haynes</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Heiner</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Hladun</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hostin</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Houck</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Howland</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Ibegwam</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Johnson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kalush</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Kline</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Koduru</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Love</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Mann</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>May</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>McCawley</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>McIntosh</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>McMullen</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Moy</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Moy</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Murphy</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Pfannkoch</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Pratts</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Puri</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Qureshi</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Reardon</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rodriguez</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Rogers</snm>
                  <fnm>YH</fnm>
               </au>
               <au>
                  <snm>Romblad</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ruhfel</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Scott</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Sitter</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Smallwood</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Stewart</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Strong</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Suh</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Thomas</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Tint</snm>
                  <fnm>NN</fnm>
               </au>
               <au>
                  <snm>Tse</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Vech</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Wetter</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Williams</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Williams</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Windsor</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Winn-Deen</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Wolfe</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Zaveri</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zaveri</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Abril</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Guigo</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Campbell</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Sjolander</snm>
                  <fnm>KV</fnm>
               </au>
               <au>
                  <snm>Karlak</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Kejariwal</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Mi</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Lazareva</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Hatton</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Narechania</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Diemer</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Muruganujan</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Guo</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Sato</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bafna</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Istrail</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lippert</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Schwartz</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Walenz</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Yooseph</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Allen</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Basu</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Baxendale</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Blick</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Caminha</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Carnes-Stine</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Caulk</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Chiang</snm>
                  <fnm>YH</fnm>
               </au>
               <au>
                  <snm>Coyne</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Dahlke</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Mays</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Dombroski</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Donnelly</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ely</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Esparham</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Fosler</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Gire</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Glanowski</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Glasser</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Glodek</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gorokhov</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Graham</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Gropman</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Harris</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Heil</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Henderson</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hoover</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Jennings</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Jordan</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Jordan</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kasha</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kagan</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Kraft</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Levitsky</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lewis</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Lopez</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ma</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Majoros</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>McDaniel</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Murphy</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Newman</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Nguyen</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Nguyen</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Nodell</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2001</pubdate>
            <volume>291</volume>
            <fpage>1304</fpage>
            <lpage>1351</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1058040</pubid>
                  <pubid idtype="pmpid" link="fulltext">11181995</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>A tour of structural genomics</p>
            </title>
            <aug>
               <au>
                  <snm>Brenner</snm>
                  <fnm>SE</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2001</pubdate>
            <volume>2</volume>
            <fpage>801</fpage>
            <lpage>809</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35093574</pubid>
                  <pubid idtype="pmpid" link="fulltext">11584296</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>Comparative n-gram analysis of whole-genome sequences: ; San Diego.</p>
            </title>
            <aug>
               <au>
                  <snm>Ganapathiraju</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Weisser</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Klein-Seetharaman</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Rosenfeld</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Carbonell</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Reddy</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <pubdate>2002</pubdate>
         </bibl>
         <bibl id="B55">
            <title>
               <p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Schaffer</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>3389</fpage>
            <lpage>3402</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146917</pubid>
                  <pubid idtype="pmpid" link="fulltext">9254694</pubid>
                  <pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B56">
            <title>
               <p>The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools</p>
            </title>
            <aug>
               <au>
                  <snm>Thompson</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Gibson</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Plewniak</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Jeanmougin</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>4876</fpage>
            <lpage>4882</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">147148</pubid>
                  <pubid idtype="pmpid" link="fulltext">9396791</pubid>
                  <pubid idtype="doi">10.1093/nar/25.24.4876</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B57">
            <title>
               <p>NACCESS, Computer Program,</p>
            </title>
            <aug>
               <au>
                  <snm>Hubbard</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Department of Biochemistry and Molecular Biology, University College, London</source>
            <pubdate>1993</pubdate>
         </bibl>
         <bibl id="B58">
            <title>
               <p>Crystal structures of c-Src reveal features of its autoinhibitory mechanism</p>
            </title>
            <aug>
               <au>
                  <snm>Xu</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Doshi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lei</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Eck</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Harrison</snm>
                  <fnm>SC</fnm>
               </au>
            </aug>
            <source>Mol Cell</source>
            <pubdate>1999</pubdate>
            <volume>3</volume>
            <fpage>629</fpage>
            <lpage>638</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1097-2765(00)80356-1</pubid>
                  <pubid idtype="pmpid" link="fulltext">10360179</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B59">
            <title>
               <p>DeLano, W.L. The PyMOL Molecular Graphics System (2002) DeLano Scientific, Palo Alto, CA, USA.</p>
            </title>
            <aug>
               <au>
                  <snm>DeLano</snm>
                  <fnm>WL</fnm>
               </au>
            </aug>
            <pubdate>2005</pubdate>
         </bibl>
         <bibl id="B60">
            <title>
               <p>The structures of deoxy human haemoglobin and the mutant Hb Tyralpha42His at 120 K</p>
            </title>
            <aug>
               <au>
                  <snm>Tame</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Vallone</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Acta Crystallogr D Biol Crystallogr</source>
            <pubdate>2000</pubdate>
            <volume>56</volume>
            <fpage>805</fpage>
            <lpage>811</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1107/S0907444900006387</pubid>
                  <pubid idtype="pmpid" link="fulltext">10930827</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>

