<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2003-4-11-r72</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>The amino-acid mutational spectrum of human genetic disease</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Vitkup</snm>
               <fnm>Dennis</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A2">
               <snm>Sander</snm>
               <fnm>Chris</fnm>
               <insr iid="I2"/>
               <insr iid="I3"/>
            </au>
            <au id="A3" ca="yes">
               <snm>Church</snm>
               <mi>M</mi>
               <fnm>George</fnm>
               <insr iid="I1"/>
               <email>for_email_look@arep.med.harvard.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Lipper Center for Computational Genetics and Department of Genetics, Harvard Medical School, Boston, MA 02115, USA</p>
            </ins>
            <ins id="I2">
               <p>Whitehead Institute for Biomedical Research, Nine Cambridge Center, Cambridge, MA 02142, USA</p>
            </ins>
            <ins id="I3">
               <p>Current address: Computational Biology Center, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, New York, NY 10021, USA</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2003</pubdate>
         <volume>4</volume>
         <issue>11</issue>
         <fpage>R72</fpage>
         <url>http://genomebiology.com/2003/4/11/R72</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">14611658</pubid>
               <pubid idtype="doi">10.1186/gb-2003-4-11-r72</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>3</day>
               <month>7</month>
               <year>2003</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>24</day>
               <month>9</month>
               <year>2003</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>30</day>
               <month>9</month>
               <year>2003</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>30</day>
               <month>10</month>
               <year>2003</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2003</year>
         <collab>Vitkup et al.; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.</collab>
      </cpyrt>
      <shorttitle>
         <p>The amino-acid mutational spectrum of human genetic diseas</p>
      </shorttitle>
      <shortabs>
         <p>The human disease spectrum is compared to the spectra of mutual amino-acid mutation frequencies, non-disease polymorphisms in human genes, and substitutions fixed between species.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Nonsynonymous mutations in the coding regions of human genes are responsible for phenotypic differences between humans and for susceptibility to genetic disease. Computational methods were recently used to predict deleterious effects of nonsynonymous human mutations and polymorphisms. Here we focus on understanding the amino-acid mutation spectrum of human genetic disease. We compare the disease spectrum to the spectra of mutual amino-acid mutation frequencies, non-disease polymorphisms in human genes, and substitutions fixed between species.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We find that the disease spectrum correlates well with the amino-acid mutation frequencies based on the genetic code. Normalized by the mutation frequencies, the spectrum can be rationalized in terms of chemical similarities between amino acids. The disease spectrum is almost identical for membrane and non-membrane proteins. Mutations at arginine and glycine residues are together responsible for about 30% of genetic diseases, whereas random mutations at tryptophan and cysteine have the highest probability of causing disease.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>The overall disease spectrum mainly reflects the mutability of the genetic code. We corroborate earlier results that the probability of a nonsynonymous mutation causing a genetic disease increases monotonically with an increase in the degree of evolutionary conservation of the mutation site and a decrease in the solvent-accessibility of the site; opposite trends are observed for non-disease polymorphisms. We estimate that the rate of nonsynonymous mutations with a negative impact on human health is less than one per diploid genome per generation.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010012">Medicine</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010009">Genetics</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Several recent studies <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp> have applied computational methods to predict potentially deleterious effects of nonsynonymous single-nucleotide polymorphisms (SNPs) in humans. SNPs represent common human alleles, usually with population frequencies greater than 1%. Both structural and evolutionary methods were used to assess potential functional effects of SNPs. It was predicted that a substantial fraction (10-30%) of human SNPs may affect protein function negatively, although the medical consequences of these SNPs remain to be established.</p>
         <p>The main goal of the work reported here is to characterize and rationalize the overall amino-acid spectrum of disease mutations and non-disease SNPs (referred to as 'benign SNPs' below). We obtain the relative probabilities that a random mutation (rather than an existing SNP) will cause a genetic disease while explicitly taking into account the underlying spectrum of nucleotide mutations. Such an approach will allow, in the future, the identification and characterization of highly mutable sites in the human genome which are also functionally important.</p>
         <p>Miller and Kumar <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> performed a detailed analysis of the disease mutations and benign SNP spectra in seven human genes. While some of our results are consistent with their study, we find major differences. For example, we observe a significantly larger contribution of mutations at arginine (Arg) and glycine (Gly) to human genetic disease. We attribute the differences to the substantially larger gene set (436 genes versus 7) used in our analysis.</p>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <sec>
            <st>
               <p>Overall amino-acid mutational spectrum</p>
            </st>
            <p>We present the amino-acid spectra of disease mutations and polymorphisms in Figure <figr fid="F1">1</figr>. The mutations from the Mendelian Inheritance in Man (MIM) database <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> annotated in SWISS-PROT <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> were used as a source of human disease mutations. In total, 4,236 mutations from 436 genes were considered. The collection of 1,037 synonymous and nonsynonymous SNPs from the extensive analysis of haplotypes in 313 human genes <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> was used as a source of benign SNPs. There was no overlap between the disease mutations and benign sets of SNPs used in the study. The spectrum of interspecies substitutions (Figure <figr fid="F1">1d</figr>) was calculated on the basis of the PAM1 matrix <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> as described in Materials and methods.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Amino-acid mutation frequencies in human genes</p>
               </caption>
               <text>
                  <p>Amino-acid mutation frequencies in human genes. <b>(a) </b>The expected mutation frequencies based on the neighbor-dependent nucleotide mutation rates. The expected mutation matrix represents the frequencies of amino-acid transitions in the absence of selection. <b>(b) </b>The nonsynonymous benign SNP frequencies using the SNP dataset of Stephens <it>et al</it>. <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. <b>(c) </b>The genetic disease mutation frequencies based on the Mendelian Inheritance in Man (MIM) database <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. <b>(d) </b>The interspecies mutation frequencies based on the PAM1 matrix <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. In the matrices, each square represents a particular amino-acid to amino-acid mutation (for example, Val &#8594; Ala). The gray level of the matrix squares is proportional to the number of observed mutations. The matrices were normalized so that the sum over all mutation frequencies for each matrix is equal to 100. The <it>y</it>-axes of the matrices represent the original (wild-type) amino acids; the <it>x</it>-axes represent the mutant amino acids (created as a result of a single-nucleotide mutation). The amino acids (given in single-letter amino-acid code) were ordered along the axes according to the side-chain chemistry <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>: (C) sulfhydryl; (STPAG) small hydrophilic; (NDEQ) acid, acid amide and hydrophilic; (HRK) basic; (MILV) hydrophobic; (FYW) large hydrophobic/aromatic. As a result of the ordering, the mutations close to the matrix diagonal tend to be more conserved.</p>
               </text>
               <graphic file="gb-2003-4-11-r72-1"/>
            </fig>
            <p>Nearly all mutations in the current MIM database represent Mendelian disease (monogenic in etiology). It remains to be seen to what extent our results pertain to disease mutation involved in polygenic disorders. At this point, too little is known about this type of mutation, and more experimental work is required in order to understand their spectrum.</p>
            <p>The mutation matrices in Figure <figr fid="F1">1</figr> are sparse (that is, a large number of the matrix elements are close or equal to zero) and nonsymmetrical (in many cases the tendency of amino acid I to mutate into amino acid J is different from the tendency of amino acid J to mutate into I). The vast majority of human genetic mutations are caused by single-nucleotide changes <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp>. Consequently, the matrices in Figure <figr fid="F1">1b,c,d</figr> represent amino-acid transitions resulting predominantly from single-nucleotide mutations in amino-acid codons. To rationalize the observed disease and benign spectra, we generated the expected mutation spectrum (Figure <figr fid="F1">1a</figr>) using the neighbor-dependent matrix of nucleotide mutation rates developed by Hess <it>et al</it>. <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> (see Materials and methods). The expected mutation matrix in Figure <figr fid="F1">1a</figr> represents the spectrum which would be observed if all nonsynonymous mutations were accepted (that is, there were no selection). The expected spectrum was generated for the disease genes considered and, separately, for a large collection of more than 7,000 human genes available from SWISS-PROT. These two spectra were almost identical (R = 0.98, <it>p </it>&lt; 0.0001), suggesting that the expected spectrum in Figure <figr fid="F1">1a</figr> reflects general properties of all human genes (such as amino-acid codon frequencies and context-dependent nucleotide mutation frequencies). Here and throughout the paper we use the <it>t</it>-test statistics with <it>n</it>-2 degrees of freedom to estimate the significance of linear correlations. Random shuffling simulations confirmed the significance values obtained using the <it>t</it>-test.</p>
            <p>The spectrum of disease mutations was calculated separately for membrane proteins. The program TMHMM <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> was used to detect potential transmembrane regions. The disease spectrum for membrane proteins is very similar to the all-protein disease spectrum (R = 0.97, <it>p </it>&lt; 0.0001 for all disease mutations in membrane proteins, 1,598 in total; R = 0.75, <it>p </it>&lt; 0.0001 for disease mutations in transmembrane regions, 372 in total). Evidently, specific properties of membrane proteins and the constraints on them are not able to significantly modify the disease spectrum common to all proteins.</p>
         </sec>
         <sec>
            <st>
               <p>Correlations between the expected and the observed spectra</p>
            </st>
            <p>Close-to-diagonal mutations in Figure <figr fid="F1">1a,b,c,d</figr> represent substitutions between amino acids with similar chemical properties (conservative mutations). The interspecies substitutions (Figure <figr fid="F1">1d</figr>) contain the highest fraction of conservative mutations compared to disease mutations and benign SNPs. The frequencies of the benign SNPs, disease mutations, and interspecies substitutions are plotted versus expected frequencies in Figure <figr fid="F2">2</figr>. Benner <it>et al</it>. <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> showed that the genetic code affects the amino-acid substitution spectrum at early stages of divergence, whereas chemical similarities dominate at longer evolutionary distances. The correlation between the benign and expected spectra observed in our study (R = 0.78, <it>p </it>&lt; 0.0001), is an expected extension of Benner's <it>et al</it>. conclusion to even shorter evolutionary distances (variations within a population).</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>The expected frequencies of amino-acid to amino-acid mutations versus observed frequencies of the genetic disease mutations, nonsynonymous benign SNPs, and interspecies mutations</p>
               </caption>
               <text>
                  <p>The expected frequencies of amino-acid to amino-acid mutations versus observed frequencies of the genetic disease mutations, nonsynonymous benign SNPs, and interspecies mutations. Comparison with <b>(a) </b>genetic disease mutations from the Mendelian Inheritance in Man (MIM) database <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, <b>(b) </b>nonsynonymous benign SNPs, based on the study by Stephens <it>et al</it>. <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, and <b>(c) </b>interspecies mutations based on the PAM1 matrix <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. Each point in the figure represents a certain type of amino-acid to amino-acid mutation. Only amino-acid transitions resulting from single-nucleotide mutations in amino-acid codons were considered. The mutation frequencies in each class (benign, disease and interspecies) were normalized to 100.</p>
               </text>
               <graphic file="gb-2003-4-11-r72-2"/>
            </fig>
            <p>Interestingly, we also find a strong correlation of the disease mutation spectrum with the expected spectrum based on the genetic code (R = 0.71, <it>p </it>&lt; 0.0001). The correlation of disease mutation frequencies with the chemical dissimilarities between original and mutant amino acids is apparent only after normalization by the expected frequencies (Figure <figr fid="F3">3a,b</figr>). Consequently, in the majority of cases the comparison of amino-acid types (wild type versus mutant) will be insufficient to distinguish neutral from disease variants.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Chemical dissimilarities between original and mutant amino acids versus observed mutation frequencies</p>
               </caption>
               <text>
                  <p>Chemical dissimilarities between original and mutant amino acids versus observed mutation frequencies. <b>(a) </b>The amino-acid dissimilarities versus the frequencies of disease mutations. <b>(b) </b>The amino-acid dissimilarities versus the relative frequencies of genetic disease mutations (normalized by site mutabilities). The chemical dissimilarities between amino acids were characterized using the Grantham score <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. Each point in the figure represents a certain type of amino-acid to amino-acid mutation. Only amino-acid transitions resulting from single-nucleotide mutations in amino-acid codons were considered. The mutation frequencies in (a) were normalized to 100. The relative frequencies in (b) were determined as the ratio between the disease and the expected mutation frequencies (see Figure <figr fid="F1">1</figr>). The relative frequencies are proportional to the relative probabilities of amino-acid mutations causing a disease. No correlation between the disease mutation frequencies and chemical dissimilarities is evident (a), but there is a significant correlation between the normalized frequencies and the chemical dissimilarities (b).</p>
               </text>
               <graphic file="gb-2003-4-11-r72-3"/>
            </fig>
            <p>The contribution of mutations at different amino acids to the disease spectrum is highly heterogeneous (Figure <figr fid="F4">4</figr>). Interestingly, mutations at Arg residues account for almost 15% of the disease mutations. This is a direct consequence of the well-known high mutability of Arg (due to deamination of 5'-CpG dinucleotides in Arg codons) <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>, the relatively high frequency of Arg in human proteins (&lt;4%), and the fact that Arg mutates to residues with very different chemical properties (cysteine (Cys), glycine (Gly), histidine (His), lysine (Lys), leucine (Leu), methionine (Met), proline (Pro), flutamine (Gln), serine (Ser) and tryptophan (Trp)). The relative probability of a disease mutation at different amino acids (Figure <figr fid="F4">4b</figr>) was calculated by dividing the disease and expected frequencies. Accordingly, a random mutation at a Trp or Cys residue has the highest probability of causing a disease. This correlates well with the highest evolutionary conservation of exactly these two residues <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. Both Trp and Cys residues play a prominent part in determining protein stability. In addition to Trp and Cys, the high probability of disease mutations at Gly may be related to important structural roles often played by this residue. For example, mutations at Gly, which is frequently present at the turns of alpha-helices, might have a negative impact on protein structural stability. Our definition of the relative probability of disease mutations is similar to the relative clinical observation likelihood (RCOL) used by Cooper <it>et al. </it>in several publications (see, for example <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>). In the next section we extend the relative probabilities to interspecies comparisons.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Contribution of mutations at different amino acids to the overall mutation spectrum</p>
               </caption>
               <text>
                  <p>Contribution of mutations at different amino acids to the overall mutation spectrum. <b>(a) </b>The fraction of mutations at different amino acids. The fractions are shown separately for benign, expected and disease mutations (normalized to 100% within each class). The contribution of mutations at different amino acids to the overall spectrum is highly heterogeneous. For example, mutations at arginine (R) constitute approximately 15% of all mutations. This is a direct consequence of a high mutability of the 5'-CpG dinucleotides in the arginine codons. <b>(b) </b>The relative probability that a random mutation at different amino acids will cause a genetic disease. Importantly, because the overall probability that a random mutation will cause a genetic disease is unknown, the probabilities in (b) have only relative meaning (for example, the probability that a random mutation will cause a disease mutation at alanine (A) versus valine (V)). For display purposes it was assumed that 1 in 100 random mutations causes a genetic disease. Mutations at tryptophan (W) and cysteine (C) have the highest probability of causing a disease. This correlates with the fact that these are the most highly conserved amino acids in evolution <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>.</p>
               </text>
               <graphic file="gb-2003-4-11-r72-4"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Probabilities of mutations or SNPs as a function of the mutation/SNP-site properties</p>
            </st>
            <p>To complement the analysis of the amino-acid mutation matrices, we investigated how the probabilities of benign SNPs and disease mutations depend on the properties of the mutation site. Several recent studies have focused on developing evolutionary and structural approaches to predict potentially deleterious human mutations <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>. Here, we focus on understanding the relative mutation probabilities (see Materials and methods). Our results are in general agreement with the previous studies. The relative probabilities of disease mutations and benign SNPs are shown in Figure <figr fid="F5">5a</figr> as a function of the interspecies evolutionary conservation of the mutation site. The conservation was characterized by the relative entropy measure using homologs with more than 30% sequence identity. The probability that a random mutation will cause a genetic disease increases monotonically with an increase in the degree of site conservation, while the probability of observing nonsynonymous benign SNPs shows the opposite trend. The synonymous benign SNPs do not change amino acids and should be predominantly neutral. As a result, their probability is uniform across sites.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>The relative mutation probabilities as a function of mutation site conservation and solvent accessibility</p>
               </caption>
               <text>
                  <p>The relative mutation probabilities as a function of mutation site conservation and solvent accessibility. Relative mutation probability as a function of <b>(a) </b>evolutionary conservation of the mutation site (measured using relative entropy), and <b>(b) </b>solvent accessibility of the mutation site in the protein structure. Because the overall probability that a random mutation will cause a genetic disease or be observed as a polymorphism is not known, the probabilities have only relative meaning within each mutation class (disease, synonymous, nonsynonymous). To show different trends clearly, the relative probabilities were normalized to 1 within each class. Conservation of mutation sites in evolution was characterized by the relative entropy using close sequence homologs (see Materials and methods). The solvent accessibility of mutation sites was calculated using the program NACESS <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>. An increase in the degree of evolutionary conservation increases the probability of deleterious mutations and decreases the probability of nonsynonymous benign SNPs (a). An increase in the degree of solvent accessibility decreases the probability of deleterious mutations and increases the probability of nonsynonymous benign SNPs (b). Synonymous mutations do not change amino-acid sequences and are predominantly neutral. Consequently, the probability that a synonymous mutation will be deleterious is relatively constant across sites.</p>
               </text>
               <graphic file="gb-2003-4-11-r72-5"/>
            </fig>
            <p>The solvent accessibility of an amino-acid residue in a protein reflects the degree of the residue's exposure to the surrounding solvent in the protein structure. The relative probability of disease-causing mutations is highest in the protein interior and lowest on the protein surface (Figure <figr fid="F5">5b</figr>). The benign SNPs show the reverse trend, as their relative probability is highest on the surface and lowest in the protein interior. This is consistent with the study by Moult and co-workers <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> (see also Ferrer-Costa <it>et al. </it><abbrgrp><abbr bid="B20">20</abbr></abbrgrp> and Bustamente <it>et al. </it><abbrgrp><abbr bid="B21">21</abbr></abbrgrp>), who suggested that the dominant mechanism by which disease mutations damage protein function is a decrease in protein stability, as opposed to mutations of active-site residues (usually located on the protein surface).</p>
            <p>Both relative entropy and solvent accessibility exclusively characterize the site of a mutation. To estimate the extent to which a given amino acid is incompatible with the residues observed at the same position in close homologs, we introduced the Grantham Ratio (GR) score based on the Grantham dissimilarity matrix <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> (see Materials and methods for a formal definition). Application of other scores, for example those based on the BLOSUM matrices, gave qualitatively similar results <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. The GR score is the ratio of two averages - the numerator being the average dissimilarity between the mutated amino acid and the residues observed at the same site in evolution, and the denominator being the average dissimilarity within the residues observed at the site in homologous proteins. Defined in this way, a GR score smaller or close to 1 suggests that the amino acid is similar to the residues observed at the site in evolution, whereas a GR score significantly larger than 1 indicates that the amino-acid change is evolutionarily radical.</p>
            <p>The role of purifying selection in shaping the mutation spectra is apparent from the cumulative distribution of the GR scores (Figure <figr fid="F6">6</figr>). Whereas the GR distribution for original (wild type) residues at benign sites (blue curve) is very similar to the distribution for all protein residues (black), the distribution for mutant residues at benign sites (green) clearly shows an excess of radical mutations. Importantly, the GR distribution of mutant residues at benign sites (green) is similar to the distribution for randomly generated mutations (cyan) and is quite different from the disease mutation distribution (red). Consequently, although a significant fraction of randomly arising nonsynonymous mutations are evolutionarily radical (and thus potentially deleterious) they are not, on average, as radical as the disease mutations and still have appreciable frequencies in the human population. Indeed, it was recently estimated <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> that the average reduction in evolutionary fitness due a mildly deleterious SNP with a significant frequency in the human population is in the range of 0.01-1%. The medical importance of such mildly deleterious human mutations remains to be established <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Cumulative probability of the Grantham ratio (GR) for different classes of residues in proteins</p>
               </caption>
               <text>
                  <p>Cumulative probability of the Grantham ratio (GR) for different classes of residues in proteins. Black, all (wild-type) protein residues; blue, original (wild-type) residues at the sites of benign SNPs; green, mutant residues at the sites of benign SNPs; cyan, residues generated by computer simulation of random mutations based on the amino-acid mutation frequencies; red, disease-causing residues from MIM. The Grantham ratio characterizes the degree of the residue's dissimilarity to the amino acids observed at the same position in evolutionary homologs (see Materials and methods). High GR values indicate radical mutations, whereas GR values that are small or around 1 indicate conservative mutations. The GR distributions demonstrate how purifying selection affects the observed mutation spectra. Comparison of the GR scores for original residues (black and blue) and disease-causing residues shows that more than half of disease mutations are radical (GR > 2) and are almost never observed in evolution.</p>
               </text>
               <graphic file="gb-2003-4-11-r72-6"/>
            </fig>
            <p>The cumulative distribution of the GR scores for disease mutations suggests that more than a half of the disease mutations are evolutionarily radical (represented by residues with GR score greater than 2). Residues with such GR scores are almost never observed in homologous sequences (blue and black curves). It is important to note that medically damaging mutations and SNPs cannot always be rationalized in terms of evolutionary radicality. Medically harmful mutations may cause late-onset human diseases without strong selection in evolution. Alternatively, a particular amino-acid substitution can be damaging to a human protein but be relatively frequent in the homologous family due to compensatory mutations. Such substitutions may account for deleterious mutations with low GR scores.</p>
         </sec>
         <sec>
            <st>
               <p>Estimation of the maximal rate of mutations with impact on human health</p>
            </st>
            <p>From Figure <figr fid="F6">6</figr> we can estimate the maximum rate of random mutations with significant impact on human health (that is, an impact similar to mutations currently annotated in MIM). We note that the mutation rate we estimate (a fraction of newly created deleterious mutations) is different from the fraction of existing SNPs with deleterious effects on protein function (estimated previously <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B23">23</abbr></abbrgrp>). The comparison between the distribution of random SNP mutations (cyan) and disease mutations (red) suggests that about 10% of the randomly generated mutations have GR scores greater than 6. Such a score corresponds to approximately 40% of the disease mutations. As a result, the total rate of the disease mutations cannot be larger than one quarter of the random mutation rate. Thus, one expects, at most, 25% of random nonsynonymous mutations to be as damaging as mutations currently annotated in MIM (similar estimates are obtained using GR cutoffs larger than 6).</p>
            <p>This estimate has a simple biochemical rationale, as mutagenesis experiments on different proteins suggest that less than 30% of random mutations substantially damage biological function or stability of proteins <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp>.</p>
            <p>Using the recent estimate of the human mutation rate of 175 mutations per diploid genome per generation <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> (corresponding to approximately two to three nonsynonymous mutations), we conclude that the rate of nonsynonymous mutations with serious impact on human health should be less than one per diploid genome per generation. This is probably a substantial overestimation of the rate because we assume that all human genes are as important for human health as the well-annotated disease genes currently in the MIM database. We emphasize that the rate of health-damaging nonsynonymous mutations is smaller than the total rate of deleterious human mutations, which is estimated to be larger than one <abbrgrp><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr></abbrgrp>.</p>
            <p>The present analysis, together with other recent studies <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B23">23</abbr></abbrgrp>, establishes the basis for understanding the spectrum of deleterious human mutations. The amino-acid substitution matrices, such as PAM <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> and BLOSUM <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>, apart from playing a fundamental role in sequence alignment, qualitatively characterize the evolutionary interchangeability of amino acids averaged over many protein families. The disease spectrum, characterized by our analysis, explores another important aspect of evolution, namely the generation of deleterious mutations. Because of all mammalian species have a broadly similarity physiology, the properties of the disease spectrum should be general, at least for mutations leading to early-onset diseases. We anticipate that understanding the disease spectrum will allow one to predict, in advance, the rates and potential medical consequences of all possible single-nucleotide mutations in the human genome.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Calculation of mutation spectra</p>
            </st>
            <p>The spectrum of expected amino-acid mutation frequencies (Figure <figr fid="F1">1a</figr>) was generated using the matrix of neighbor-dependent nucleotide mutation rates obtained by Hess <it>et al</it>. <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> (Additional data file 1). The neighbor-dependent mutation matrix was calculated by Hess <it>et al</it>. on the basis of 20,200 substitutions in aligned gene/pseudogene human sequences; the relative mutation rates were calculated for the four nucleotides in all 16 possible 5' and 3' neighborhoods. To obtain the expected amino-acid mutation frequencies for a given collection of genes, we simulated all possible single-nucleotide mutations with appropriate rates, and recorded the corresponding amino-acid changes. The nucleotide mutational spectrum of individual genes may be affected by the presence of so-called mutation hot spots <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr></abbrgrp>. However, on average, there is only a small influence of the surrounding DNA sequence (beyond nearest 5' and 3' neighbors) on the relative nucleotide mutation rates <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>.</p>
            <p>The interspecies spectrum of amino-acid mutation frequencies (Figure <figr fid="F1">1d</figr>) was calculated on the basis of Dayhoff's PAM1 matrix. The original PAM1 matrix <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> gives the probabilities of amino-acid substitutions over small evolutionary distances. These probabilities were multiplied by the amino-acid frequencies in human genes for direct comparison with the expected, disease, and benign SNPs matrices.</p>
         </sec>
         <sec>
            <st>
               <p>Structural and evolutionary analysis of mutations</p>
            </st>
            <p>The list of disease genes obtained from SWISS-PROT was filtered using the program PSEG <abbrgrp><abbr bid="B36">36</abbr></abbrgrp> to exclude genes with a significant fraction of low-complexity regions. As a result of the filtering, six genes for collagen proteins were excluded from the original set of 436 genes. Mutations at Gly residues constitute more than 50% of the collagen disease mutations (due to the collagen structural motif). Because of this bias, the collagen mutations were excluded from all calculations. If the collagen mutations are included, the total fraction of disease mutations at Gly (Figure <figr fid="F4">4a</figr>) increases from 12% to 15%.</p>
            <p>Membrane proteins and transmembrane protein regions were detected using the program TMHMM <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> with standard parameters. Out of 430 disease genes, 105 (24%) were classified as membrane proteins on the basis of the presence of at least two distinct transmembrane domains. To characterize the evolutionary conservation of mutation sites we used BLASTGP to search the nrdb90 database <abbrgrp><abbr bid="B37">37</abbr></abbrgrp> for homologs with greater than 30% sequence identity. The nrdb90 database constitutes a nonredundant merge of sequence and structural databases, which is filtered so that no pair of sequences has greater than 90% sequence identity. The homologs to each human protein were subsequently aligned using the program CLUSTALW <abbrgrp><abbr bid="B38">38</abbr></abbrgrp> with default parameters. Only mutation sites covered by more than 10 homologous sequences (excluding gaps) were used in the evolutionary analysis. The multiple sequence alignments obtained using CLUSTALW were used to characterize the relative entropy (Kullback-Leibler distance) of the benign and disease mutation sites. The relative entropy was calculated according to the formula:</p>
            <p>
               <graphic file="gb-2003-4-11-r72-i1.gif"/>
            </p>
            <p>where the summation is over all amino-acid types n in the alignment; P(n) is the probability of the amino acid n in the column corresponding to mutation; Q(n) is the probability of the amino acid n in all columns of the multiple sequence alignment.</p>
            <p>The multiple sequence alignments were also used to calculate the Grantham ratio (GR) score according to the formula:</p>
            <p>
               <graphic file="gb-2003-4-11-r72-i2.gif"/>
            </p>
            <p>where <it>D</it>(<it>A</it>,<it>B</it>) is the Grantham measure of chemical dissimilarities between amino-acid residues <it>A </it>and <it>B</it>, <it>Human_RES </it>is the human residues at the mutation site, <it>RES(i) </it>is the amino acid from the <it>i</it>th aligned sequence homolog at the mutation site, and <it>n </it>is the number of aligned sequences. Qualitatively, the GR score is a measure of dissimilarity between a human amino acid and the residues seen at the same site in homologs. In total, the relative entropy and Grantham ratio were calculated for 258 benign SNPs and 2,636 disease mutations.</p>
            <p>To characterize the structural location of disease mutations and benign SNPs, BLASTGP <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> was used to search the Protein Data Bank (PDB) <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> for sequences homologous to known structures. Only sequences with greater than 30% identity to human sequences over the entire length of the alignment were considered. In total, the solvent accessibilities were calculated for 110 benign SNPs and 840 disease mutations. The solvent accessibility of mutation sites was determined by the program NACCESS <abbrgrp><abbr bid="B41">41</abbr></abbrgrp> using the water-sphere radius of 1.4 &#197;. The solvent accessibility represents the relative exposure of a residue X in a protein structure compared to its exposure in the tripeptide Ala-X-Ala.</p>
         </sec>
         <sec>
            <st>
               <p>Calculation of relative mutation probabilities</p>
            </st>
            <p>The relative mutation probabilities shown in Figures <figr fid="F4">4b</figr>, <figr fid="F5">5a</figr>, and <figr fid="F5">5b</figr> represent conditional probabilities. Specifically, the conditional probability P(disease|descriptor), that a mutation will cause a genetic disease given a certain property (descriptor) of the mutation site was calculated according to the formula:</p>
            <p>
               <graphic file="gb-2003-4-11-r72-i3.gif"/>
            </p>
            <p>where 'descriptor' represents solvent accessibility or evolutionary conservation of the mutation site, P(descriptor|disease) is the probability that a disease mutation has a given descriptor value, P(descriptor) is the probability that a random mutation (disease or non-disease) has a given descriptor value, and P(disease) is the probability that a random mutation will cause a genetic disease. Importantly, because P(disease) is unknown, we can only estimate P(disease|descriptor) up to a constant (assuming certain P(disease) value). Consequently, we refer to P(disease|descriptor) as relative mutation probabilities. The probability that a random mutation has a given descriptor value P(descriptor) was estimated by simulating random single-nucleotide mutations using the expected amino-acid mutation frequencies (Figure <figr fid="F1">1a</figr>).</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>The following additional data are included: a list of relative mutation rates (Additional data file <supplr sid="s1">1</supplr>), a list of disease mutations (Additional data file <supplr sid="s2">2</supplr>), a list of disease mutation genes (Additional data file <supplr sid="s3">3</supplr>), a list of SNPs used in the analysis (Additional data file <supplr sid="s4">4</supplr>), and the Grantham ratio scores (Additional data file <supplr sid="s5">5</supplr>).</p>
         <suppl id="s1">
            <title>
               <p>Additional data file 1</p>
            </title>
            <caption>
               <p>A list of relative mutation rates</p>
            </caption>
            <text>
               <p>A list of relative mutation rates</p>
            </text>
            <file name="gb-2003-4-11-r72-s1.TXT">
               <p>Click here for additional data file</p>
            </file>
         </suppl>
         <suppl id="s2">
            <title>
               <p>Additional data file 2</p>
            </title>
            <caption>
               <p>A list of disease mutations</p>
            </caption>
            <text>
               <p>A list of disease mutations</p>
            </text>
            <file name="gb-2003-4-11-r72-s2.TXT">
               <p>Click here for additional data file</p>
            </file>
         </suppl>
         <suppl id="s3">
            <title>
               <p>Additional data file 3</p>
            </title>
            <caption>
               <p>A list of disease mutation genes</p>
            </caption>
            <text>
               <p>A list of disease mutation genes</p>
            </text>
            <file name="gb-2003-4-11-r72-s3.TXT">
               <p>Click here for additional data file</p>
            </file>
         </suppl>
         <suppl id="s4">
            <title>
               <p>Additional data file 4</p>
            </title>
            <caption>
               <p>A list of SNPs used in the analysis</p>
            </caption>
            <text>
               <p>A list of SNPs used in the analysis</p>
            </text>
            <file name="gb-2003-4-11-r72-s4.TXT">
               <p>Click here for additional data file</p>
            </file>
         </suppl>
         <suppl id="s5">
            <title>
               <p>Additional data file 5</p>
            </title>
            <caption>
               <p>The Grantham ratio scores</p>
            </caption>
            <text>
               <p>The Grantham ratio scores</p>
            </text>
            <file name="gb-2003-4-11-r72-s5.TXT">
               <p>Click here for additional data file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank Jay Shendure, John Aach, Patrik D'haeseleer, Daniel Segre, Peter Kharchenko, and Tzachi Pilpel for discussions. This work was supported in part by research grants from the US Department of Energy through the grant DOE DE-FG02-87-ER60565.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>SNPs, protein structure, and disease.</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Moult</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Hum Mutat</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>263</fpage>
            <lpage>270</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/humu.22</pubid>
                  <pubid idtype="pmpid" link="fulltext">11295823</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Prediction of deleterious human alleles.</p>
            </title>
            <aug>
               <au>
                  <snm>Sunyaev</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ramensky</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Koch</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Lathe</snm>
                  <fnm>W</fnm>
                  <suf>3rd</suf>
               </au>
               <au>
                  <snm>Kondrashov</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Hum Mol Genet</source>
            <pubdate>2001</pubdate>
            <volume>10</volume>
            <fpage>591</fpage>
            <lpage>597</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/hmg/10.6.591</pubid>
                  <pubid idtype="pmpid" link="fulltext">11230178</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Predicting deleterious amino acid substitutions.</p>
            </title>
            <aug>
               <au>
                  <snm>Ng</snm>
                  <fnm>PC</fnm>
               </au>
               <au>
                  <snm>Henikoff</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <fpage>863</fpage>
            <lpage>874</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.176601</pubid>
                  <pubid idtype="pmpid" link="fulltext">11337480</pubid>
                  <pubid idtype="pmcid">311071</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation.</p>
            </title>
            <aug>
               <au>
                  <snm>Chasman</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Adams</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2001</pubdate>
            <volume>307</volume>
            <fpage>683</fpage>
            <lpage>706</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2001.4510</pubid>
                  <pubid idtype="pmpid" link="fulltext">11254390</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Understanding human disease mutations through the use of interspecific variation.</p>
            </title>
            <aug>
               <au>
                  <snm>Miller</snm>
                  <fnm>MP</fnm>
               </au>
               <au>
                  <snm>Kumar</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Hum Mol Genet</source>
            <pubdate>2001</pubdate>
            <volume>10</volume>
            <fpage>2319</fpage>
            <lpage>2328</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/hmg/10.21.2319</pubid>
                  <pubid idtype="pmpid" link="fulltext">11689479</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Assessing the relative importance of the biophysical properties of amino acid substitutions associated with human genetic disease.</p>
            </title>
            <aug>
               <au>
                  <snm>Terp</snm>
                  <fnm>BN</fnm>
               </au>
               <au>
                  <snm>Cooper</snm>
                  <fnm>DN</fnm>
               </au>
               <au>
                  <snm>Christensen</snm>
                  <fnm>IT</fnm>
               </au>
               <au>
                  <snm>Jorgensen</snm>
                  <fnm>FS</fnm>
               </au>
               <au>
                  <snm>Bross</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Gregersen</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Krawczak</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Hum Mutat</source>
            <pubdate>2002</pubdate>
            <volume>20</volume>
            <fpage>98</fpage>
            <lpage>109</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/humu.10095</pubid>
                  <pubid idtype="pmpid" link="fulltext">12124990</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <aug>
               <au>
                  <snm>McKusick</snm>
                  <fnm>VA</fnm>
               </au>
            </aug>
            <source>Mendelian Inheritance in Man. Catalogs of Human Genes and Genetic Disorders</source>
            <publisher>Baltimore: John Hopkins University Press</publisher>
            <edition>12</edition>
            <pubdate>1998</pubdate>
         </bibl>
         <bibl id="B8">
            <title>
               <p>The SWISS-PROT protein sequence data bank and its new supplement TrEMBL.</p>
            </title>
            <aug>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1996</pubdate>
            <volume>24</volume>
            <fpage>21</fpage>
            <lpage>25</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">145613</pubid>
                  <pubid idtype="pmpid" link="fulltext">8594581</pubid>
                  <pubid idtype="doi">10.1093/nar/24.1.21</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Haplotype variation and linkage disequilibrium in 313 human genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Stephens</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Schneider</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Tanguay</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Choi</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Acharya</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Stanley</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Jiang</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Messer</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Chew</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Han</snm>
                  <fnm>JH</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2001</pubdate>
            <volume>293</volume>
            <fpage>489</fpage>
            <lpage>493</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1059431</pubid>
                  <pubid idtype="pmpid" link="fulltext">11452081</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>A model of evolutionary change in proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Dayhoff</snm>
                  <fnm>MO</fnm>
               </au>
            </aug>
            <source>In Atlas of Protein Sequence and Structure</source>
            <publisher>Dayhoff MO</publisher>
            <editor>Silver Spring: National Biomedical Research Foundation</editor>
            <pubdate>1978</pubdate>
            <fpage>345</fpage>
            <lpage>352</lpage>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis.</p>
            </title>
            <aug>
               <au>
                  <snm>Halushka</snm>
                  <fnm>MK</fnm>
               </au>
               <au>
                  <snm>Fan</snm>
                  <fnm>JB</fnm>
               </au>
               <au>
                  <snm>Bentley</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Hsie</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Shen</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Weder</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Cooper</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Lipshutz</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Chakravarti</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>1999</pubdate>
            <volume>22</volume>
            <fpage>239</fpage>
            <lpage>247</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/10297</pubid>
                  <pubid idtype="pmpid" link="fulltext">10391210</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Characterization of single-nucleotide polymorphisms in coding regions of human genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Cargill</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Altshuler</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ireland</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sklar</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Ardlie</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Patil</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Shaw</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Lane</snm>
                  <fnm>CR</fnm>
               </au>
               <au>
                  <snm>Lim</snm>
                  <fnm>EP</fnm>
               </au>
               <au>
                  <snm>Kalyanaraman</snm>
                  <fnm>N</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nat Genet</source>
            <pubdate>1999</pubdate>
            <volume>22</volume>
            <fpage>231</fpage>
            <lpage>238</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/10290</pubid>
                  <pubid idtype="pmpid" link="fulltext">10391209</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Wide variations in neighbor-dependent substitution rates.</p>
            </title>
            <aug>
               <au>
                  <snm>Hess</snm>
                  <fnm>ST</fnm>
               </au>
               <au>
                  <snm>Blake</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Blake</snm>
                  <fnm>RD</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1994</pubdate>
            <volume>236</volume>
            <fpage>1022</fpage>
            <lpage>1033</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0022-2836(94)90009-4</pubid>
                  <pubid idtype="pmpid">8120884</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>A hidden Markov model for predicting transmembrane helices in protein sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Sonnhammer</snm>
                  <fnm>EL</fnm>
               </au>
               <au>
                  <snm>von Heijne</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Krogh</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Proc Int Conf Intell Syst Mol Biol</source>
            <pubdate>1998</pubdate>
            <volume>6</volume>
            <fpage>175</fpage>
            <lpage>182</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9783223</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Amino acid substitution during functionally constrained divergent evolution of protein sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Benner</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Cohen</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Gonnet</snm>
                  <fnm>GH</fnm>
               </au>
            </aug>
            <source>Protein Eng</source>
            <pubdate>1994</pubdate>
            <volume>7</volume>
            <fpage>1323</fpage>
            <lpage>1332</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7700864</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>The CpG dinucleotide and human genetic disease.</p>
            </title>
            <aug>
               <au>
                  <snm>Cooper</snm>
                  <fnm>DN</fnm>
               </au>
               <au>
                  <snm>Youssoufian</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Hum Genet</source>
            <pubdate>1988</pubdate>
            <volume>78</volume>
            <fpage>151</fpage>
            <lpage>155</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF00278187</pubid>
                  <pubid idtype="pmpid">3338800</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Neighboring-nucleotide effects on the rates of germ-line single base-pair substitution in human genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Krawczak</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ball</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Cooper</snm>
                  <fnm>DN</fnm>
               </au>
            </aug>
            <source>Am J Hum Genet</source>
            <pubdate>1998</pubdate>
            <volume>63</volume>
            <fpage>474</fpage>
            <lpage>488</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1086/301965</pubid>
                  <pubid idtype="pmpid" link="fulltext">9683596</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Accounting for human polymorphisms predicted to affect protein function.</p>
            </title>
            <aug>
               <au>
                  <snm>Ng</snm>
                  <fnm>PC</fnm>
               </au>
               <au>
                  <snm>Henikoff</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>436</fpage>
            <lpage>446</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">155281</pubid>
                  <pubid idtype="pmpid" link="fulltext">11875032</pubid>
                  <pubid idtype="doi">10.1101/gr.212802</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Human non-synonymous SNPs: server and survey.</p>
            </title>
            <aug>
               <au>
                  <snm>Ramensky</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Sunyaev</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>3894</fpage>
            <lpage>3900</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">137415</pubid>
                  <pubid idtype="pmpid" link="fulltext">12202775</pubid>
                  <pubid idtype="doi">10.1093/nar/gkf493</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Characterization of disease-associated single amino acid polymorphisms in terms of sequence and structure properties.</p>
            </title>
            <aug>
               <au>
                  <snm>Ferrer-Costa</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Orozco</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>de la Cruz</snm>
                  <fnm>X</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2002</pubdate>
            <volume>315</volume>
            <fpage>771</fpage>
            <lpage>786</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2001.5255</pubid>
                  <pubid idtype="pmpid" link="fulltext">11812146</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Solvent accessibility and purifying selection within proteins of <it>Escherichia coli </it>and <it>Salmonella enterica</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Bustamante</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Townsend</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Hartl</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2000</pubdate>
            <volume>17</volume>
            <fpage>301</fpage>
            <lpage>308</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10677853</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Amino acid difference formula to help explain protein evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Grantham</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1974</pubdate>
            <volume>185</volume>
            <fpage>862</fpage>
            <lpage>864</lpage>
            <xrefbib>
               <pubid idtype="pmpid">4843792</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Positive and negative selection on the human genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Fay</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Wyckoff</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>CI</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2001</pubdate>
            <volume>158</volume>
            <fpage>1227</fpage>
            <lpage>1234</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11454770</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>A biased assessment of the use of SNPs in human complex traits.</p>
            </title>
            <aug>
               <au>
                  <snm>Terwilliger</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Haghighi</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Heikkalinna</snm>
                  <fnm>TS</fnm>
               </au>
               <au>
                  <snm>Goring</snm>
                  <fnm>HH</fnm>
               </au>
            </aug>
            <source>Curr Opin Genet Dev</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>726</fpage>
            <lpage>734</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0959-437X(02)00357-X</pubid>
                  <pubid idtype="pmpid" link="fulltext">12433588</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease.</p>
            </title>
            <aug>
               <au>
                  <snm>Lohmueller</snm>
                  <fnm>KE</fnm>
               </au>
               <au>
                  <snm>Pearce</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>Pike</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Hirschhorn</snm>
                  <fnm>JN</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2003</pubdate>
            <volume>33</volume>
            <fpage>177</fpage>
            <lpage>182</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1071</pubid>
                  <pubid idtype="pmpid" link="fulltext">12524541</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Saturation mutagenesis of human interleukin-3.</p>
            </title>
            <aug>
               <au>
                  <snm>Olins</snm>
                  <fnm>PO</fnm>
               </au>
               <au>
                  <snm>Bauer</snm>
                  <fnm>SC</fnm>
               </au>
               <au>
                  <snm>Braford-Goldberg</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sterbenz</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Polazzi</snm>
                  <fnm>JO</fnm>
               </au>
               <au>
                  <snm>Caparon</snm>
                  <fnm>MH</fnm>
               </au>
               <au>
                  <snm>Klein</snm>
                  <fnm>BK</fnm>
               </au>
               <au>
                  <snm>Easton</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Paik</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Klover</snm>
                  <fnm>JA</fnm>
               </au>
               <etal/>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>1995</pubdate>
            <volume>270</volume>
            <fpage>23754</fpage>
            <lpage>23760</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.270.40.23754</pubid>
                  <pubid idtype="pmpid" link="fulltext">7559548</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Amino acid sequence determinants of beta-lactamase structure and activity.</p>
            </title>
            <aug>
               <au>
                  <snm>Huang</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Petrosino</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hirsch</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Shenkin</snm>
                  <fnm>PS</fnm>
               </au>
               <au>
                  <snm>Palzkill</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1996</pubdate>
            <volume>258</volume>
            <fpage>688</fpage>
            <lpage>703</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1996.0279</pubid>
                  <pubid idtype="pmpid" link="fulltext">8637002</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Genetic analysis of protein stability and function.</p>
            </title>
            <aug>
               <au>
                  <snm>Pakula</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Sauer</snm>
                  <fnm>RT</fnm>
               </au>
            </aug>
            <source>Annu Rev Genet</source>
            <pubdate>1989</pubdate>
            <volume>23</volume>
            <fpage>289</fpage>
            <lpage>310</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.ge.23.120189.001445</pubid>
                  <pubid idtype="pmpid" link="fulltext">2694933</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Structural and genetic analysis of the folding and function of T4 lysozyme.</p>
            </title>
            <aug>
               <au>
                  <snm>Matthews</snm>
                  <fnm>BW</fnm>
               </au>
            </aug>
            <source>FASEB J</source>
            <pubdate>1996</pubdate>
            <volume>10</volume>
            <fpage>35</fpage>
            <lpage>41</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8566545</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Estimate of the mutation rate per nucleotide in humans.</p>
            </title>
            <aug>
               <au>
                  <snm>Nachman</snm>
                  <fnm>MW</fnm>
               </au>
               <au>
                  <snm>Crowell</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2000</pubdate>
            <volume>156</volume>
            <fpage>297</fpage>
            <lpage>304</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10978293</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>High genomic deleterious mutation rates in hominids.</p>
            </title>
            <aug>
               <au>
                  <snm>Eyre-Walker</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Keightley</snm>
                  <fnm>PD</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1999</pubdate>
            <volume>397</volume>
            <fpage>344</fpage>
            <lpage>347</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/16915</pubid>
                  <pubid idtype="pmpid" link="fulltext">9950425</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Amino acid substitution matrices from protein blocks.</p>
            </title>
            <aug>
               <au>
                  <snm>Henikoff</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Henikoff</snm>
                  <fnm>JG</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1992</pubdate>
            <volume>89</volume>
            <fpage>10915</fpage>
            <lpage>10919</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">50453</pubid>
                  <pubid idtype="pmpid" link="fulltext">1438297</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Recombinational and mutational hotspots within the human lipoprotein lipase gene.</p>
            </title>
            <aug>
               <au>
                  <snm>Templeton</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Weiss</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Nickerson</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Boerwinkle</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Sing</snm>
                  <fnm>CF</fnm>
               </au>
            </aug>
            <source>Am J Hum Genet</source>
            <pubdate>2000</pubdate>
            <volume>66</volume>
            <fpage>69</fpage>
            <lpage>83</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1086/302699</pubid>
                  <pubid idtype="pmpid" link="fulltext">10631137</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Statistical inference of sequence-dependent mutation rates.</p>
            </title>
            <aug>
               <au>
                  <snm>Zavolan</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kepler</snm>
                  <fnm>TB</fnm>
               </au>
            </aug>
            <source>Curr Opin Genet Dev</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <fpage>612</fpage>
            <lpage>615</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0959-437X(00)00242-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">11682302</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Use of mutation spectra analysis software.</p>
            </title>
            <aug>
               <au>
                  <snm>Rogozin</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Kondrashov</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Glazko</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Hum Mutat</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>83</fpage>
            <lpage>102</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/1098-1004(200102)17:2&lt;83::AID-HUMU1>3.0.CO;2-E</pubid>
                  <pubid idtype="pmpid" link="fulltext">11180592</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Analysis of compositionally biased regions in sequence databases.</p>
            </title>
            <aug>
               <au>
                  <snm>Wootton</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Federhen</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Methods Enzymol</source>
            <pubdate>1996</pubdate>
            <volume>266</volume>
            <fpage>554</fpage>
            <lpage>571</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8743706</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Removing near-neighbour redundancy from large protein sequence collections.</p>
            </title>
            <aug>
               <au>
                  <snm>Holm</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Sander</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>1998</pubdate>
            <volume>14</volume>
            <fpage>423</fpage>
            <lpage>429</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/14.5.423</pubid>
                  <pubid idtype="pmpid" link="fulltext">9682055</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Using CLUSTAL for multiple sequence alignments.</p>
            </title>
            <aug>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Thomposon</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Gibson</snm>
                  <fnm>TJ</fnm>
               </au>
            </aug>
            <source>Methods Enzymol</source>
            <pubdate>1996</pubdate>
            <volume>266</volume>
            <fpage>383</fpage>
            <lpage>402</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8743695</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Schaffer</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acid Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>3389</fpage>
            <lpage>3402</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146917</pubid>
                  <pubid idtype="pmpid" link="fulltext">9254694</pubid>
                  <pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>The Protein Data Bank: A computer based archival file for macromolecular structures.</p>
            </title>
            <aug>
               <au>
                  <snm>Bernstein</snm>
                  <fnm>FC</fnm>
               </au>
               <au>
                  <snm>Koetzle</snm>
                  <fnm>TF</fnm>
               </au>
               <au>
                  <snm>Williams</snm>
                  <fnm>GJB</fnm>
               </au>
               <au>
                  <snm>Meyer</snm>
                  <fnm>EF</fnm>
                  <suf>Jr</suf>
               </au>
               <au>
                  <snm>Brice</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Rodgers</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Kennard</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Shimanouchi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Tasumi</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1977</pubdate>
            <volume>112</volume>
            <fpage>535</fpage>
            <lpage>542</lpage>
            <xrefbib>
               <pubid idtype="pmpid">875032</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <aug>
               <au>
                  <snm>Hubbard</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>NACCESS Computer Program</source>
            <publisher>London: Department of Biochemistry and Molecular Biology, University College London</publisher>
            <pubdate>1993</pubdate>
         </bibl>
         <bibl id="B42">
            <aug>
               <au>
                  <snm>Mount</snm>
                  <fnm>DW</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <publisher>Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press</publisher>
            <pubdate>2001</pubdate>
         </bibl>
      </refgrp>
   </bm>
</art>
