<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2164-7-329</ui>
   <ji>1471-2164</ji>
   <fm>
      <dochead>Methodology article</dochead>
      <bibl>
         <title>
            <p>A novel statistical method to estimate the effective SNP size in vertebrate genomes and categorized genomic regions</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Seo</snm>
               <fnm>Daekwan</fnm>
               <insr iid="I1"/>
               <email>dseo@vcu.edu</email>
            </au>
            <au id="A2">
               <snm>Jiang</snm>
               <fnm>Cizhong</fnm>
               <insr iid="I1"/>
               <email>cjiang@vcu.edu</email>
            </au>
            <au id="A3" ca="yes">
               <snm>Zhao</snm>
               <fnm>Zhongming</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>zzhao@vcu.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA 23298, USA</p>
            </ins>
            <ins id="I2">
               <p>Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, VA 23284, USA</p>
            </ins>
         </insg>
         <source>BMC Genomics</source>
         <issn>1471-2164</issn>
         <pubdate>2006</pubdate>
         <volume>7</volume>
         <issue>1</issue>
         <fpage>329</fpage>
         <url>http://www.biomedcentral.com/1471-2164/7/329</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17196097</pubid>
               <pubid idtype="doi">10.1186/1471-2164-7-329</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>31</day>
               <month>7</month>
               <year>2006</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>29</day>
               <month>12</month>
               <year>2006</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>29</day>
               <month>12</month>
               <year>2006</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2006</year>
         <collab>Seo et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>The local environment of single nucleotide polymorphisms (SNPs) contains abundant genetic information for the study of mechanisms of mutation, genome evolution, and causes of diseases. Recent studies revealed that neighboring-nucleotide biases on SNPs were strong and the genome-wide bias patterns could be represented by a small subset of the total SNPs. It remains unsolved for the estimation of the effective SNP size, the number of SNPs that are sufficient to represent the bias patterns observed from the whole SNP data.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>To estimate the effective SNP size, we developed a novel statistical method, SNPKS, which considers both the statistical and biological significances. SNPKS consists of two major steps: to obtain an initial effective size by the Kolmogorov-Smirnov test (KS test) and to find an intermediate effective size by interval evaluation. The SNPKS algorithm was implemented in computer programs and applied to the real SNP data. The effective SNP size was estimated to be 38,200, 39,300, 38,000, and 38,700 in the human, chimpanzee, dog, and mouse genomes, respectively, and 39,100, 39,600, 39,200, and 42,200 in human intergenic, genic, intronic, and CpG island regions, respectively.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>SNPKS is the first statistical method to estimate the effective SNP size. It runs efficiently and greatly outperforms the algorithm implemented in SNPNB. The application of SNPKS to the real SNP data revealed the similar small effective SNP size (38,000 &#8211; 42,200) in the human, chimpanzee, dog, and mouse genomes as well as in human genomic regions. The findings suggest strong influence of genetic factors across vertebrate genomes.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Single nucleotide polymorphisms (SNPs) are the most abundant genetic variation in vertebrate genomes. They have been important tools in many biological fields, including mutation mechanisms, genome evolution, disease studies, pharmacogenomics, and fine mapping <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. Strong demands of SNP data and rapid technology advancements helped us to have observed an exponential rate in the discovery of SNPs during the past decade. As of October 2006, the largest public SNP database, dbSNP, deposited more than 87 million submitted SNPs from 35 organisms; among them, more than 50 million SNPs have their references to the genomes <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Many more SNPs are to be identified in the near future.</p>
         <p>Mutation at the nucleotide level does not occur randomly. Recent studies of mutational mechanisms revealed that the influence of neighboring nucleotides on SNPs was strong in the human and mouse genomes <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. Specifically, strong biases relative to the genome average were observed at the two adjacent sites of the SNPs and small biases could extend farther, i.e., as far as 200 nucleotides at each flanking side. Further, the bias patterns varied among the SNP types, e.g., the extent of the biases for transition SNPs (A/G and C/T) was much stronger than those for transversion SNPs (A/C, G/T, A/T, and C/G). Importantly, the bias patterns observed in the whole genome could be sufficiently represented by only a small subset of SNPs randomly sampled from the genome-wide data <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. The effective SNP size, defined as the minimum number of the SNPs that can essentially represent the bias patterns of the whole SNPs, was roughly estimated to be 30,000 in the human and mouse genomes <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. Because the SNPs identified in the today's genomes reflect the combinatory evolutionary processes such as methylated CpG mutation hotspots, high transition rate, selection on functional elements, and error-prone DNA replication and repair, a small effective SNP size suggests the strong influence of one or several genetic factors, especially the CpG effects in vertebrate genomes <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>.</p>
         <p>So far, how to efficiently estimate the effective SNP size remains unsolved. The SNPNB, an user-friendly application implemented by Java and Perl, can assist the user to evaluate and obtain a number which is close to the effective SNP size <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. However, there are three major limitations. First, because SNPNB is based on an empirical re-sampling approach, it becomes impractical to find the effective SNP size when the number of SNPs is very large, which is always true for a genome-wide or chromosome-wide analysis. Second, it is a challenging task on how to define that the bias pattern observed from one data set is (nearly) the same as that from another set. This is because we need to combine four nucleotides at all sites on the 5' side and 3' side of the SNPs. Third, there is a statistical problem. The null hypothesis is that there is no neighboring nucleotide bias of SNPs in the genome, or the frequencies of nucleotides at SNP neighboring sites are the same as the average nucleotide frequencies in the genome sequences. Therefore, the observed neighboring nucleotide biases (%) should be compared to the expected value, which is 0 for each nucleotide at each site. However, a hypothesis test with a very large number of SNPs may not lead to a meaningful conclusion. For example, for the 8,043,656 human SNPs tested in SNPNB <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, when the frequency difference is as low as 0.00028 (0.028%) for nucleotide C, the Z test would be significant at the 5% significance level (&#945; = 0.05). As a result, the null hypothesis is rejected. Obviously, such a small difference is not biologically meaningful or significant. Here, we propose an integrated statistical method to estimate the effective SNP size. This method (SNPKS) considers both the biological significance and the statistical significance so that it avoids the problem of leading to an unreasonably large effective SNP size when only the statistical significance is considered <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. We also developed an efficient pipeline to iteratively evaluate the intermediate values for the effective size. SNPKS consists of two steps: (1) evaluation of an initial effective size; and (2) iterative tests of the initial effective size by interval evaluation.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>KS test and interval evaluation</p>
            </st>
            <p>To estimate the effective SNP size, we designed and integrated a two-step procedure in our system (Figure <figr fid="F1">1</figr>). In the first step, we apply the Kolmogorov-Smirnov test (KS test) <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> to obtain an initial effective SNP size. Usually, the KS test is used to evaluate whether a sample is from a population based on a specific distribution by comparing the corresponding cumulative frequencies. Here we estimate an initial effective SNP size by comparing the cumulative frequencies from the whole SNP data with those from a sample SNP data.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Flowchart of the SNPKS</p>
               </caption>
               <text>
                  <p><b>Flowchart of the SNPKS</b>. This figure illustrates the integrated two-step procedures in the SNPKS method. KS test: Kolmogorov-Smirnov test; C.I.: confidence interval; <it>N</it><sub><it>e</it>0</sub>: intermediate effective SNP size; <it>N</it><sub><it>e</it></sub>: effective SNP size.</p>
               </text>
               <graphic file="1471-2164-7-329-1"/>
            </fig>
            <p>For the whole SNP data with size <it>N</it>, we randomly generate a sub-sample of SNPs with size <it>n</it><sub>0 </sub>(<it>n</it><sub>0</sub>&lt;&lt;<it>N</it>). First, we calculate the frequency of each nucleotide at each neighboring site in the whole SNP data and sample SNP data, respectively. According to our previous analyses <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>, we examine 20 neighboring sites immediately adjacent to each SNP: 10 sites at the 5' side and 10 sites at the 3' side, because these 20 sites have the largest neighboring-nucleotide biases. Figure <figr fid="F2">2</figr> illustrates the polymorphic site of a SNP and its 5' and 3' flanking sequences. Then, we compare the cumulative relative frequency of each nucleotide in the neighboring sequences. Let <it>f</it><sub><it>i</it>,<it>j </it></sub>and <it>g</it><sub><it>i</it>,<it>j </it></sub>be the frequency of sample SNP data and whole SNP data, respectively, and <it>F</it><sub><it>i</it>,<it>j </it></sub>and <it>G</it><sub><it>i</it>,<it>j </it></sub>be cumulative frequency of sample SNPs and whole SNPs, respectively. Here <it>i </it>denotes one of the four nucleotides (A, C, G, and T) and <it>j </it>denotes a neighboring site in the SNP flanking sequences (-10 to -1 at the 5' side and +1 to +10 at the 3' side). The <it>f</it><sub><it>i</it>,<it>j </it></sub>and <it>F</it><sub><it>i</it>,<it>j </it></sub>of the sample SNPs are defined by:</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Annotation of a SNP and its flanking sites</p>
               </caption>
               <text>
                  <p><b>Annotation of a SNP and its flanking sites</b>. SNPKS uses ten sites immediately adjacent to the polymorphic site (A/G) at the 5' side and 3' side. A minus sign indicates the flanking site of the 5' side and a positive sign indicates the 3' side.</p>
               </text>
               <graphic file="1471-2164-7-329-2"/>
            </fig>
            <p>
               <m:math name="1471-2164-7-329-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:msub>
                           <m:mi>f</m:mi>
                           <m:mrow>
                              <m:mi>i</m:mi>
                              <m:mo>,</m:mo>
                              <m:mi>j</m:mi>
                           </m:mrow>
                        </m:msub>
                        <m:mo>=</m:mo>
                        <m:mfrac>
                           <m:mn>1</m:mn>
                           <m:mrow>
                              <m:msub>
                                 <m:mi>n</m:mi>
                                 <m:mn>0</m:mn>
                              </m:msub>
                           </m:mrow>
                        </m:mfrac>
                        <m:mstyle displaystyle="true">
                           <m:munderover>
                              <m:mo>&#8721;</m:mo>
                              <m:mn>1</m:mn>
                              <m:mrow>
                                 <m:msub>
                                    <m:mi>n</m:mi>
                                    <m:mtext>0</m:mtext>
                                 </m:msub>
                              </m:mrow>
                           </m:munderover>
                           <m:mrow>
                              <m:mrow>
                                 <m:mo>{</m:mo>
                                 <m:mrow>
                                    <m:mtable columnalign="left">
                                       <m:mtr columnalign="left">
                                          <m:mtd columnalign="left">
                                             <m:mn>1</m:mn>
                                          </m:mtd>
                                          <m:mtd columnalign="left">
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mi>f</m:mi>
                                                <m:mtext>&#160;</m:mtext>
                                                <m:msup>
                                                   <m:mi>j</m:mi>
                                                   <m:mrow>
                                                      <m:mi>t</m:mi>
                                                      <m:mi>h</m:mi>
                                                   </m:mrow>
                                                </m:msup>
                                                <m:mi>n</m:mi>
                                                <m:mi>u</m:mi>
                                                <m:mi>c</m:mi>
                                                <m:mi>l</m:mi>
                                                <m:mi>e</m:mi>
                                                <m:mi>o</m:mi>
                                                <m:mi>t</m:mi>
                                                <m:mi>i</m:mi>
                                                <m:mi>d</m:mi>
                                                <m:mi>e</m:mi>
                                                <m:mo>=</m:mo>
                                                <m:mi>i</m:mi>
                                                <m:mo>,</m:mo>
                                             </m:mrow>
                                          </m:mtd>
                                       </m:mtr>
                                       <m:mtr columnalign="left">
                                          <m:mtd columnalign="left">
                                             <m:mn>0</m:mn>
                                          </m:mtd>
                                          <m:mtd columnalign="left">
                                             <m:mrow>
                                                <m:mi>o</m:mi>
                                                <m:mi>t</m:mi>
                                                <m:mi>h</m:mi>
                                                <m:mi>e</m:mi>
                                                <m:mi>r</m:mi>
                                                <m:mi>w</m:mi>
                                                <m:mi>i</m:mi>
                                                <m:mi>s</m:mi>
                                                <m:mi>e</m:mi>
                                             </m:mrow>
                                          </m:mtd>
                                       </m:mtr>
                                    </m:mtable>
                                 </m:mrow>
                              </m:mrow>
                           </m:mrow>
                        </m:mstyle>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>1</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGMbGzdaWgaaWcbaGaemyAaKMaeiilaWIaemOAaOgabeaakiabg2da9maalaaabaGaeGymaedabaGaemOBa42aaSbaaSqaaiabicdaWaqabaaaaOWaaabCaeaadaGabeqaauaabaqaciaaaeaacqaIXaqmaeaacqWGPbqAcqWGMbGzcqqGGaaicqWGQbGAdaahaaWcbeqaaiabdsha0jabdIgaObaakiabd6gaUjabdwha1jabdogaJjabdYgaSjabdwgaLjabd+gaVjabdsha0jabdMgaPjabdsgaKjabdwgaLjabg2da9iabdMgaPjabcYcaSaqaaiabicdaWaqaaiabd+gaVjabdsha0jabdIgaOjabdwgaLjabdkhaYjabdEha3jabdMgaPjabdohaZjabdwgaLbaaaiaawUhaaaWcbaGaeGymaedabaGaemOBa42aaSbaaWqaaiabbcdaWaqabaaaniabggHiLdGccaWLjaGaaCzcamaabmaabaGaeGymaedacaGLOaGaayzkaaaaaa@6825@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>
               <m:math name="1471-2164-7-329-i2" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:msub>
                           <m:mi>F</m:mi>
                           <m:mrow>
                              <m:mi>i</m:mi>
                              <m:mo>,</m:mo>
                              <m:mi>j</m:mi>
                           </m:mrow>
                        </m:msub>
                        <m:mo>=</m:mo>
                        <m:mrow>
                           <m:mo>{</m:mo>
                           <m:mrow>
                              <m:mtable columnalign="left">
                                 <m:mtr columnalign="left">
                                    <m:mtd columnalign="left">
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>f</m:mi>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mo>,</m:mo>
                                                <m:mo>&#8722;</m:mo>
                                                <m:mn>10</m:mn>
                                             </m:mrow>
                                          </m:msub>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd columnalign="left">
                                       <m:mrow>
                                          <m:mi>i</m:mi>
                                          <m:mi>f</m:mi>
                                          <m:mtext>&#160;</m:mtext>
                                          <m:mi>j</m:mi>
                                          <m:mo>=</m:mo>
                                          <m:mo>&#8722;</m:mo>
                                          <m:mn>10</m:mn>
                                       </m:mrow>
                                    </m:mtd>
                                 </m:mtr>
                                 <m:mtr columnalign="left">
                                    <m:mtd columnalign="left">
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>F</m:mi>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mo>,</m:mo>
                                                <m:mi>j</m:mi>
                                                <m:mo>&#8722;</m:mo>
                                                <m:mn>1</m:mn>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mo>+</m:mo>
                                          <m:msub>
                                             <m:mi>f</m:mi>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mo>,</m:mo>
                                                <m:mi>j</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd columnalign="left">
                                       <m:mrow>
                                          <m:mi>i</m:mi>
                                          <m:mi>f</m:mi>
                                          <m:mo>&#8722;</m:mo>
                                          <m:mn>9</m:mn>
                                          <m:mo>&#8804;</m:mo>
                                          <m:mi>j</m:mi>
                                          <m:mo>&#8804;</m:mo>
                                          <m:mn>10</m:mn>
                                          <m:mtext>&#160;</m:mtext>
                                          <m:mi>a</m:mi>
                                          <m:mi>n</m:mi>
                                          <m:mi>d</m:mi>
                                          <m:mtext>&#160;</m:mtext>
                                          <m:mi>j</m:mi>
                                          <m:mo>&#8800;</m:mo>
                                          <m:mn>0</m:mn>
                                       </m:mrow>
                                    </m:mtd>
                                 </m:mtr>
                              </m:mtable>
                              <m:mo>.</m:mo>
                           </m:mrow>
                        </m:mrow>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>2</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGgbGrdaWgaaWcbaGaemyAaKMaeiilaWIaemOAaOgabeaakiabg2da9maaceqabaqbaeaabiGaaaqaaiabdAgaMnaaBaaaleaacqWGPbqAcqGGSaalcqGHsislcqaIXaqmcqaIWaamaeqaaaGcbaGaemyAaKMaemOzayMaeeiiaaIaemOAaOMaeyypa0JaeyOeI0IaeGymaeJaeGimaadabaGaemOray0aaSbaaSqaaiabdMgaPjabcYcaSiabdQgaQjabgkHiTiabigdaXaqabaGccqGHRaWkcqWGMbGzdaWgaaWcbaGaemyAaKMaeiilaWIaemOAaOgabeaaaOqaaiabdMgaPjabdAgaMjabgkHiTiabiMda5iabgsMiJkabdQgaQjabgsMiJkabigdaXiabicdaWiabbccaGiabdggaHjabd6gaUjabdsgaKjabbccaGiabdQgaQjabgcMi5kabicdaWaaacqGGUaGlaiaawUhaaiaaxMaacaWLjaWaaeWaaeaacqaIYaGmaiaawIcacaGLPaaaaaa@6940@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>The <it>g</it><sub><it>i</it>,<it>j </it></sub>and <it>G</it><sub><it>i</it>,<it>j </it></sub>are defined similarly except for that <it>N</it>, the size of the whole SNP data, instead of <it>n</it><sub>0 </sub>is used. The maximum difference of cumulative frequency for each nucleotide is defined by:</p>
            <p>
               <m:math name="1471-2164-7-329-i3" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:msub>
                           <m:mi>D</m:mi>
                           <m:mi>i</m:mi>
                        </m:msub>
                        <m:mo>=</m:mo>
                        <m:munder>
                           <m:mrow>
                              <m:mi>max</m:mi>
                              <m:mo>&#8289;</m:mo>
                           </m:mrow>
                           <m:mi>j</m:mi>
                        </m:munder>
                        <m:mrow>
                           <m:mo>|</m:mo>
                           <m:mrow>
                              <m:msub>
                                 <m:mi>F</m:mi>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mo>,</m:mo>
                                    <m:mi>j</m:mi>
                                 </m:mrow>
                              </m:msub>
                              <m:mo>&#8722;</m:mo>
                              <m:msub>
                                 <m:mi>G</m:mi>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mo>,</m:mo>
                                    <m:mi>j</m:mi>
                                 </m:mrow>
                              </m:msub>
                           </m:mrow>
                           <m:mo>|</m:mo>
                        </m:mrow>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>3</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGebardaWgaaWcbaGaemyAaKgabeaakiabg2da9maaxababaGagiyBa0MaeiyyaeMaeiiEaGhaleaacqWGQbGAaeqaaOWaaqWaaeaacqWGgbGrdaWgaaWcbaGaemyAaKMaeiilaWIaemOAaOgabeaakiabgkHiTiabdEeahnaaBaaaleaacqWGPbqAcqGGSaalcqWGQbGAaeqaaaGccaGLhWUaayjcSdGaaCzcaiaaxMaadaqadaqaaiabiodaZaGaayjkaiaawMcaaaaa@47B2@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>Next, we compare the maximum difference of cumulative relative frequency of each nucleotide with the threshold value of biological significance instead of test statistic given by the KS test, because the tolerable difference in the KS test is too generous to find a reasonable sample size <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. Here we specify 0.2% as a biological significance threshold value because when the frequency difference is &lt; 0.2%, it appears that the biases are likely due to the stochastic variance and they are not biologically meaningful based on our previous studies <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>. When the <it>D</it><sub><it>i </it></sub>fails to be less than 0.2%, the size of the sample set is increased by 10,000. The procedure above will run it again until the criterion of <it>D</it><sub><it>i </it></sub>&lt; 0.2% is satisfied (Figure <figr fid="F1">1</figr>). This step gives out an initial effective size (<it>n</it>) for the next procedure.</p>
            <p>After getting an initial effective SNP size <it>n</it>, the second step is to test whether the bias patterns obtained from the sample with this size can effectively represent the bias patterns observed from the whole SNP data. This is performed by an interval evaluation using 30 different SNP subsets with size <it>n </it>randomly sampled from the whole SNP dataset. We choose 30 different subsets because when the sample size approaches 30, we can safely assume the distribution of the bias patterns to be normal for inference purpose by central limit theorem. For each nucleotide at each neighboring site, we calculate the bias relative to the genome sequence average (e.g., A: 29.55%, C: 20.44%, G: 20.46%, and T: 29.54% in the human genome) in each of the 30 sample sets, and then get its average bias (<m:math name="1471-2164-7-329-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mover accent="true"><m:mrow><m:msub><m:mi>B</m:mi><m:mrow><m:mi>i</m:mi><m:mo>,</m:mo><m:mi>j</m:mi></m:mrow></m:msub></m:mrow><m:mo stretchy="true">&#175;</m:mo></m:mover></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqdaaqaaiabdkeacnaaBaaaleaacqWGPbqAcqGGSaalcqWGQbGAaeqaaaaaaaa@318E@</m:annotation></m:semantics></m:math>). When the difference between the average bias in the 30 sample sets (<m:math name="1471-2164-7-329-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mover accent="true"><m:mrow><m:msub><m:mi>B</m:mi><m:mrow><m:mi>i</m:mi><m:mo>,</m:mo><m:mi>j</m:mi></m:mrow></m:msub></m:mrow><m:mo stretchy="true">&#175;</m:mo></m:mover></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqdaaqaaiabdkeacnaaBaaaleaacqWGPbqAcqGGSaalcqWGQbGAaeqaaaaaaaa@318E@</m:annotation></m:semantics></m:math>) and the corresponding bias in the whole SNP data (<it>B</it><sub><it>i</it>,<it>j</it></sub>) is less than its standard deviation for all nucleotides at all neighboring sites, an intermediate effective SNP size (<it>N</it><sub><it>e</it>0</sub>) is found. That is, the proposed method iteratively evaluates the following difference:</p>
            <p>|<it>B</it><sub><it>i</it>,<it>j </it></sub>- <m:math name="1471-2164-7-329-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mover accent="true"><m:mrow><m:msub><m:mi>B</m:mi><m:mrow><m:mi>i</m:mi><m:mo>,</m:mo><m:mi>j</m:mi></m:mrow></m:msub></m:mrow><m:mo stretchy="true">&#175;</m:mo></m:mover></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqdaaqaaiabdkeacnaaBaaaleaacqWGPbqAcqGGSaalcqWGQbGAaeqaaaaaaaa@318E@</m:annotation></m:semantics></m:math>| &lt;<it>s</it><sub><it>i</it>,<it>j</it></sub>, &#8704; <it>i</it>, <it>j </it>&#160;&#160;&#160; (4)</p>
            <p>where <it>s</it><sub><it>i</it>,<it>j </it></sub>is the standard deviation from the 30 bias patterns. Otherwise, we increase the sample size by 10,000 and run this step again. The procedure runs iteratively until the criterion is satisfied.</p>
            <p>The two steps above run repetitively 100 times. This leads to 100 <it>N</it><sub><it>e</it>0 </sub>estimates. The effective SNP size is thus the mean of these 100 estimates.</p>
         </sec>
         <sec>
            <st>
               <p>Implementation</p>
            </st>
            <p>To implement the SNPKS method, we developed computer programs in C and Perl. In the SNPKS algorithm, we need to regularly generate random numbers and then extract random SNPs from the whole SNP dataset based on the generated random numbers. This routine is computationally intensive; therefore, we wrote a computer program in C. The KS test and interval estimation were implemented in a Perl script, which calls the C program automatically. The application has been tested on both Microsoft Windows and Linux operating systems. The programs, instructions, and test data are available at the website <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Applications</p>
            </st>
            <p>We applied SNPKS to estimate the effective SNP size in four vertebrate genomes: human, chimpanzee, dog, and mouse. The genome-wide SNP data were retrieved from the dbSNP database of the National Center for Biotechnology Information (NCBI) (see Methods). The number of the test SNPs are shown in Table <tblr tid="T1">1</tblr>. Here we describe the procedures using dog SNPs because there is no previous investigation of the point mutation patterns using SNPs in the dog genome and also our analysis indicated that the neighboring-nucleotide biases were strongest among these four genomes. We started a random sample size 10,000 (<it>n</it><sub>0</sub>) and run the SNPKS programs iteratively. We got an initial effective size (<it>n</it>) 38,000. Then, we took 30 random subsets with size 38,000 and performed an interval evaluation. As shown in Figure <figr fid="F3">3</figr>, for all four nucleotides at all 20 neighboring sites, the intervals obtained from the 30 samples covered the frequencies observed from the whole dog SNPs. Table S1 (see <supplr sid="S1">Additional file 1</supplr>) shows the biases relative to the average nucleotide frequencies in dog genome for each neighboring site from whole dog SNPs and from 30 random sample subsets with size 38,000. It also includes the information of standard deviation.</p>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p>Bias comparison using genome-wide dog SNPs. Supplementary Table S1 &#8211; Bias comparison using genome-wide dog SNPs.</p>
               </text>
               <file name="1471-2164-7-329-S1.doc">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Estimation of the effective SNP size</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Total # of test SNPs</p>
                     </c>
                     <c ca="center">
                        <p>Effective SNP size (<it>N</it><sub><it>e</it></sub>)</p>
                     </c>
                     <c ca="center">
                        <p>95% C.I.</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Human</p>
                     </c>
                     <c ca="center">
                        <p>5,200,425</p>
                     </c>
                     <c ca="center">
                        <p>38,200</p>
                     </c>
                     <c ca="center">
                        <p>35,700 &#8211; 40,700</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Chimpanzee</p>
                     </c>
                     <c ca="center">
                        <p>1,470,501</p>
                     </c>
                     <c ca="center">
                        <p>39,300</p>
                     </c>
                     <c ca="center">
                        <p>37,200 &#8211; 41,400</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Dog</p>
                     </c>
                     <c ca="center">
                        <p>2,690,084</p>
                     </c>
                     <c ca="center">
                        <p>38,000</p>
                     </c>
                     <c ca="center">
                        <p>35,700 &#8211; 40,300</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Mouse (Build 126)</p>
                     </c>
                     <c ca="center">
                        <p>7,832,159</p>
                     </c>
                     <c ca="center">
                        <p>38,700</p>
                     </c>
                     <c ca="center">
                        <p>36,400 &#8211; 41,000</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Mouse (Build 123)</p>
                     </c>
                     <c ca="center">
                        <p>376,146</p>
                     </c>
                     <c ca="center">
                        <p>39,100</p>
                     </c>
                     <c ca="center">
                        <p>36,800 &#8211; 41,400</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Human HapMap phase I</p>
                     </c>
                     <c ca="center">
                        <p>861,498</p>
                     </c>
                     <c ca="center">
                        <p>38,400</p>
                     </c>
                     <c ca="center">
                        <p>35,800 &#8211; 41,000</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Human HapMap phase II</p>
                     </c>
                     <c ca="center">
                        <p>2,435,362</p>
                     </c>
                     <c ca="center">
                        <p>39,100</p>
                     </c>
                     <c ca="center">
                        <p>36,900 &#8211; 41,300</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Human intergenic region<sup>a</sup></p>
                     </c>
                     <c ca="center">
                        <p>2,422,730</p>
                     </c>
                     <c ca="center">
                        <p>39,100</p>
                     </c>
                     <c ca="center">
                        <p>36,800 &#8211; 41,400</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Human gene<sup>a</sup></p>
                     </c>
                     <c ca="center">
                        <p>744,987</p>
                     </c>
                     <c ca="center">
                        <p>39,600</p>
                     </c>
                     <c ca="center">
                        <p>37,400 &#8211; 41,800</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Human intron<sup>a</sup></p>
                     </c>
                     <c ca="center">
                        <p>889,956</p>
                     </c>
                     <c ca="center">
                        <p>39,200</p>
                     </c>
                     <c ca="center">
                        <p>37,000 &#8211; 41,400</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Human CpG island<sup>a</sup></p>
                     </c>
                     <c ca="center">
                        <p>95,561</p>
                     </c>
                     <c ca="center">
                        <p>42,200</p>
                     </c>
                     <c ca="center">
                        <p>39,500 &#8211; 44,900</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p><sup>a</sup>The average nucleotide frequencies in the genomic regions were used.</p>
               </tblfn>
            </tbl>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Neighboring-nucleotide bias patterns for dog SNPs and interval evaluation</p>
               </caption>
               <text>
                  <p><b>Neighboring-nucleotide bias patterns for dog SNPs and interval evaluation</b>. The color lines show the neighboring-nucleotide biases relative to the dog genome sequence average using 2,690,084 dog SNPs. For 30 random sample sets with size 38,000, we obtained their average bias (<m:math name="1471-2164-7-329-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mover accent="true"><m:mrow><m:msub><m:mi>B</m:mi><m:mrow><m:mi>i</m:mi><m:mo>,</m:mo><m:mi>j</m:mi></m:mrow></m:msub></m:mrow><m:mo stretchy="true">&#175;</m:mo></m:mover></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqdaaqaaiabdkeacnaaBaaaleaacqWGPbqAcqGGSaalcqWGQbGAaeqaaaaaaaa@318E@</m:annotation></m:semantics></m:math>) and standard deviation (<it>s</it><sub><it>i</it>,<it>j</it></sub>) for each nucleotide at each position. The vertical bars represents the interval <m:math name="1471-2164-7-329-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mover accent="true"><m:mrow><m:msub><m:mi>B</m:mi><m:mrow><m:mi>i</m:mi><m:mo>,</m:mo><m:mi>j</m:mi></m:mrow></m:msub></m:mrow><m:mo stretchy="true">&#175;</m:mo></m:mover></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqdaaqaaiabdkeacnaaBaaaleaacqWGPbqAcqGGSaalcqWGQbGAaeqaaaaaaaa@318E@</m:annotation></m:semantics></m:math> &#177; <it>s</it><sub><it>i</it>,<it>j</it></sub>. The figure shows that the intervals at all positions cover the corresponding biases observed from the whole dog SNPs. On the x axis, a minus sign indicates the 5' side and a positive sign indicates the 3' side of the SNPs.</p>
               </text>
               <graphic file="1471-2164-7-329-3"/>
            </fig>
            <p>The effective SNP size was estimated to be 38,200 &#177; 2,500, 39,300 &#177; 2,100, 38,000 &#177; 2,300, and 38,700 &#177; 2,300 for the human, chimpanzee, dog, and mouse genomes, respectively. The 95% confidence intervals were in a narrow range in these four genomes (Table <tblr tid="T1">1</tblr>). Overall, the effective SNP size (1) is similar in these four genomes, and (2) represents only a small proportion of the genome-wide SNP data (<it>N</it>). The comparative results suggest strong genetic influences such as CpG effects across vertebrate genomes <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>.</p>
            <p>We next estimated the effective SNP size for the specific genomic regions in the human genome. We used SNPs in intergenic regions, genes, introns, and CpG islands and the average nucleotide frequencies in the corresponding genomic regions. We did not test in exons or untranslated regions (UTRs) because the number of SNPs mapped in exons or UTRs was not sufficient. Although the numbers of SNPs and sequence compositions in the genomic regions varied greatly, their effective SNP sizes were estimated to be similar: 39,100 &#177; 2,300 (intergenic regions), 39,600 &#177; 2,200 (genes), 39,200 &#177; 2,200 (introns), and 42,200 &#177; 2,700 (CpG islands). The effective SNP size in the CpG islands is the largest. This reflects the lack of methylation and suppression of 5<sup>m</sup>C deamination in CpG islands <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Performance comparison</p>
            </st>
            <p>We compared the performance of our method versus the empirical iterative procedures in SNPNB <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. We tested on a Dell Workstation (CPU 2 &#215; 3.0 GHz, Memory 4 GB, Redhat Linux Enterprise WS 3.0) using human and mouse SNP data. The results indicated that SNPKS greatly outperformed SNPNB. For human SNP data, SNPNB elapsed ~28 hours by a single round of evaluation and ~151 hours by 10 rounds. This compares to only ~7.5 hours in SNPKS, which doesn't require the recursive rounds to estimate the effective SNP size (Table <tblr tid="T2">2</tblr>). Assuming that SNPNB requires 10 rounds of evaluation to find a number close to the effective SNP size, it required 20-fold more computation time for human SNP data and 35-fold more time for mouse SNP data than SNPKS (Table <tblr tid="T2">2</tblr>).</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Performance comparison with SNPNB</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="left">
                        <p>SNP data</p>
                     </c>
                     <c ca="left">
                        <p>Process<sup>a</sup></p>
                     </c>
                     <c ca="center">
                        <p>SNPNB [8]</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>SNPKS</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3">
                        <hr/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>1 round</p>
                     </c>
                     <c ca="center">
                        <p>5 rounds</p>
                     </c>
                     <c ca="center">
                        <p>10 rounds</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Human</p>
                     </c>
                     <c ca="left">
                        <p>Preprocessing data</p>
                     </c>
                     <c ca="center">
                        <p>2 h 50 m 25 s</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>2 h 24 m 49 s</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Estimation of <it>N</it><sub><it>e</it></sub></p>
                     </c>
                     <c ca="center">
                        <p>24 h 56 m 1 s</p>
                     </c>
                     <c ca="center">
                        <p>82 h 48 m 1 s</p>
                     </c>
                     <c ca="center">
                        <p>147 h 39 m 40 s</p>
                     </c>
                     <c ca="center">
                        <p>5 h 7 m 35 s</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Total elapsed time</p>
                     </c>
                     <c ca="center">
                        <p>27 h 46 m 26 s</p>
                     </c>
                     <c ca="center">
                        <p>85 h 38 m 26 s</p>
                     </c>
                     <c ca="center">
                        <p>151 h 8 m 56 s</p>
                     </c>
                     <c ca="center">
                        <p>7 h 32 m 24 s</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Mouse (Build 123)</p>
                     </c>
                     <c ca="left">
                        <p>Preprocessing data</p>
                     </c>
                     <c ca="center">
                        <p>0 h 2 m 51 s</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>0 h 2 m 52 s</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Estimation of <it>N</it><sub><it>e</it></sub></p>
                     </c>
                     <c ca="center">
                        <p>7 h 27 m 55 s</p>
                     </c>
                     <c ca="center">
                        <p>37 h 53 m 53 s</p>
                     </c>
                     <c ca="center">
                        <p>75 h 18 m 28 s</p>
                     </c>
                     <c ca="center">
                        <p>2 h 4 m 50 s</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Total elapsed time</p>
                     </c>
                     <c ca="center">
                        <p>7 h 30 m 46 s</p>
                     </c>
                     <c ca="center">
                        <p>37 h 56 m 44 s</p>
                     </c>
                     <c ca="center">
                        <p>75 h 21 m 19 s</p>
                     </c>
                     <c ca="center">
                        <p>2 h 7 m 42 s</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p><sup>a</sup>The tests were performed in a Linux workstation (CPU 2 &#215; 3.0 GHz, memory 4 GB).</p>
               </tblfn>
            </tbl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>The effective SNP size in this study is defined as the minimum number of SNPs that are sufficient to represent the bias patterns observed from the whole SNP data. It measures the sequence context patterns observed in the SNPs. To our knowledge, this term has not been used in any other report except for our previous study <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. This term is similar to the effective population size or effective sample size, which has been widely used in the population genetics or disease study. For example, the effective population size (<it>N</it><sub><it>e</it></sub>) is used to measure the size of an idealized population having the same effect of random sampling on gene frequency as that in the actual population. It can be estimated by <it>N</it><sub><it>e </it></sub>= &#952;/4 <it>&#956;</it>, where <it>&#952; </it>is the population parameter and <it>&#956;</it>is the mutation rate per sequence per generation <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. One example of the effective sample size in SNP analysis (named as the SNP-effective sample size) is to estimate the number of sequences in an alignment given the observed number of SNPs in the sequences <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>.</p>
         <p>It is important to know how many SNPs are sufficient to represent the bias patterns observed from the whole SNP data. First, this evaluates whether the observed patterns are representative or random in the genome <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>. The early studies of mutation pattern often revealed inconsistent results because of their limited size of the data, such as the influence of the neighboring nucleotides on SNPs <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B6">6</abbr><abbr bid="B20">20</abbr></abbrgrp>, mutation direction (e.g., G/C &#8594; A/T vs. A/T &#8594; G/C) <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>, and methylation-dependent transition rates <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr></abbrgrp>. To draw a firm conclusion, such an evaluation is required. Second, a small effective size means high confidence of the observed biases and indicates some genetic factors (e.g., CpG effects) that contribute significantly to the biases. Further investigations of these factors will help us understand how mutation occurs in the specific sequence environment and has been maintained or survived during the evolutionary paths. Third, comparative analysis of the bias patterns and effective SNP size should reveal the mutability of the sequences in the different genomic regions among genomes, which is important for the study of genome evolution. Currently, more than 300 genome sequences have been completed and available in NCBI. The comparative genomics is emerging as an important research field. The comparative analysis of SNP data should provide us many insights on the mutability of sequence, genome sequence evolution, genetic drift, and natural selection among different genomes.</p>
         <p>In this study, our analysis revealed the similar bias patterns observed in the chimpanzee, dog, human, and mouse genomes. However, the extent of the neighboring-nucleotide biases was different among these four genomes: the strongest in the dog genome and the weakest in the mouse genome (Figure <figr fid="F3">3</figr>, other data not shown). For example, the bias for nucleotide G at the 3' immediate adjacent site was +4.89% (human), +4.80% (chimpanzee), +2.76% (mouse), and +6.21% (dog) relative to the corresponding genome average, respectively. Surprisingly, the effective sizes of the SNPs in these four genomes were similar (Table <tblr tid="T1">1</tblr>) and not statistically significantly different (ANOVA <it>P </it>= 0.83). While this may suggest the strong influence of genetic factors in these genomes, further investigations on these factors and SNP ascertainment biases are warranted. Note that, because the size was increased by 10,000 each time in the iterative procedure in SNPKS, it is unlikely the method led to the similar effective sizes. Further, the effective sizes of the SNPs among the human genomic regions were overall slightly higher than the genome-wide whole SNPs (Table <tblr tid="T1">1</tblr>). The effective SNP size was the largest in the CpG islands and smallest in the intergenic and intronic regions. This is consistent with the previous findings of the strong CpG effect in the genome except for the CpG islands <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B15">15</abbr></abbrgrp> and the possible selection in the CpG islands <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. Further simulation analysis may figure out how each genetic factor impacts on the effective SNP size. Overall, the sizes obtained in this study were in a small range, 38,000 &#8211; 42,200, suggesting that the effective SNP size in vertebrate genomes and their genomic regions is remarkably similar and close to 40,000. This effective size may have three applications. First, it provides a new metric to assess genetic variability in other studies or other genomes. Second, millions of SNPs are to be discovered in many vertebrate genomes in the near future. The effective SNP size may be found larger or smaller than 40,000. Then, comparative genomics studies may uncover one or some genetic factors (e.g., mutability of nucleotides, CpG effect, natural selection, biased gene conversion, recombination, biased DNA mismatch repair) contribute to the difference. Third, it may be used to compare the mutation pattern in a variety of specific datasets, such as disease causing mutations, SNPs with rare allele frequencies or common allele frequencies, different SNP types (e.g., C/T polymorphisms, C &#8594; T changes), SNPs at the biased codons or at the fourfold degenerate sites, mutation direction asymmetry at two DNA strands (e.g., A &#8594; G vs. T &#8594; C).</p>
         <p>To examine whether the above estimates of the effective SNP size were reliable, we performed some additional analysis using different datasets. First, SNP discovery and sampling is often subject to ascertainment bias <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. Some SNPs in the dbSNP database were identified from a very limited number of sequences, even from only two sequences. To examine whether such an ascertainment bias has an effect on the estimation of the effective SNP size, we used HapMap phase I SNPs, which had strong ascertainment bias, and phase II SNPs, which had less sampling ascertainment bias <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. The effective SNP size was estimated to be similar: 38,400 &#177; 2,600 for the 861,498 phase I SNPs and 39,100 &#177; 2,200 for the 2,435,362 phase II SNPs (Table <tblr tid="T1">1</tblr>). Second, we examined whether a random subset of the total SNPs can have the similar effective size as the total SNPs. We generated 9 random subsets from the human SNPs, with their sizes ranging from 1.0 to 5.0 million. The effective SNP sizes of these 9 subsets were within a range of 36,200 to 39,800 (Table <tblr tid="T3">3</tblr>) and similar to that (38,200) of the total human SNPs. Third, we compared the effective sizes of two sets of SNPs in the mouse genome: 376,146 SNPs (Build 123) and 7,832,159 (Build 126). Again, the effective sizes were similar (Table <tblr tid="T1">1</tblr>). These results suggest that the estimation of effective SNP size is less impacted by the ascertainment biases and sample size, thus, is robust.</p>
         <tbl id="T3">
            <title>
               <p>Table 3</p>
            </title>
            <caption>
               <p>Estimation of the effective SNP size in random subsets of human SNPs</p>
            </caption>
            <tblbdy cols="3">
               <r>
                  <c ca="left">
                     <p>Sample size</p>
                  </c>
                  <c ca="center">
                     <p>Effective SNP size (<it>N</it><sub><it>e</it></sub>)</p>
                  </c>
                  <c ca="center">
                     <p>95% C.I.</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>1.0 &#215; 10<sup>6</sup></p>
                  </c>
                  <c ca="center">
                     <p>37,000</p>
                  </c>
                  <c ca="center">
                     <p>34,700 &#8211; 39,300</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>1.5 &#215; 10<sup>6</sup></p>
                  </c>
                  <c ca="center">
                     <p>39,600</p>
                  </c>
                  <c ca="center">
                     <p>37,200 &#8211; 42,000</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>2.0 &#215; 10<sup>6</sup></p>
                  </c>
                  <c ca="center">
                     <p>38,600</p>
                  </c>
                  <c ca="center">
                     <p>36,000 &#8211; 41,200</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>2.5 &#215; 10<sup>6</sup></p>
                  </c>
                  <c ca="center">
                     <p>38,100</p>
                  </c>
                  <c ca="center">
                     <p>35,800 &#8211; 40,400</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>3.0 &#215; 10<sup>6</sup></p>
                  </c>
                  <c ca="center">
                     <p>36,200</p>
                  </c>
                  <c ca="center">
                     <p>33,700 &#8211; 38,700</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>3.5 &#215; 10<sup>6</sup></p>
                  </c>
                  <c ca="center">
                     <p>38,600</p>
                  </c>
                  <c ca="center">
                     <p>36,500 &#8211; 40,700</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>4.0 &#215; 10<sup>6</sup></p>
                  </c>
                  <c ca="center">
                     <p>37,300</p>
                  </c>
                  <c ca="center">
                     <p>35,000 &#8211; 39,600</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>4.5 &#215; 10<sup>6</sup></p>
                  </c>
                  <c ca="center">
                     <p>39,200</p>
                  </c>
                  <c ca="center">
                     <p>36,900 &#8211; 41,500</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>5.0 &#215; 10<sup>6</sup></p>
                  </c>
                  <c ca="center">
                     <p>39,800</p>
                  </c>
                  <c ca="center">
                     <p>37,500 &#8211; 42,100</p>
                  </c>
               </r>
            </tblbdy>
            <tblfn>
               <p>The effective SNP sizes of these 9 subsets are not statistically significantly different (ANOVA, <it>P </it>= 0.40).</p>
            </tblfn>
         </tbl>
         <p>It is difficult to evaluate whether one bias pattern is statistically the same as another because it needs to compare each of the four nucleotides at each flanking site of SNPs. If we consider 10 neighboring sites at each flanking side of SNPs, we will run and compare 80 multiple statistical tests. It is hard to control type I error (&#945;). If we apply the Bonferroni corrections for 80 multiple comparison tests, we have the significance level &#945;/80 for each test <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. That means the value of test statistic is too large to lead an unreasonable effective size. Previously, a re-sampling algorithm was implemented to evaluate the effective SNP size <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. That algorithm is based on an empirical approach and is computationally intensive even with a few rounds of re-sampling when the number of SNPs is large. Moreover, its algorithm can only evaluate a number which is close to the effective SNP size. The SNPKS method proposed in this study first applies the KS test to obtain an initial effective SNP size. Instead of a usual statistical approach, the biological tolerable difference (i.e., 0.2%) is used <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. This improvement seems robust because the patterns from the sample with size <it>N</it><sub><it>e </it></sub>are essentially the same as those from the whole SNP data by our visual examination (Figure <figr fid="F3">3</figr>). However, SNPKS is still heuristic. While it should yield adequate approximation of the effective SNP size for practical use, there is no guarantee on finding the absolute minimum effective SNP size.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>We proposed an integrated statistical method (SNPKS) to estimate the effective SNP size. SNPKS consists of two major steps: evaluation of an initial effective size and iterative tests of the size by interval evaluation. SNPKS considers both the biological significance and the statistical significance. SNPKS is the first method to estimate the effective size based on statistical tests and greatly outperforms SNPNB. The application of SNPKS to real SNP data in the human, chimpanzee, dog, and mouse genomes revealed the similar small effective SNP size (i.e., 38,000 &#8211; 42,200) in these four genomes and in human genomic regions, suggesting strong influence of genetic factors across vertebrate genomes.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Data preparation</p>
            </st>
            <p>We downloaded SNPs in four vertebrate genomes (chimpanzee, dog, human, and mouse) from the dbSNP database <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. We retrieved 10,430,753 human SNPs, 1,470,601 chimpanzee SNPs, and 3,023,305 dog SNPs from the Build 125. We retrieved 499,051 mouse SNPs from the Build 123 because we found that more than 1 million SNPs that were newly deposited in the Build 125 were from Perlegen, Inc.. Our analysis indicated that these SNPs were distributed mainly on three chromosomes (2, 4, and 11), which have higher GC content than the mouse genome average. Because we are limited to apply our method to the bias patterns in the whole genome in this analysis, these skewed data would influence the interpretation of the results <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. When we prepared the manuscript, the dbSNP database released the Build 126, which increased the number of mouse SNPs to 8,274,228. Therefore, we downloaded them for our analysis. We downloaded HapMap SNPs from the International HapMap Project web site <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>, including 1,156,867 Phase I SNPs and 3,395,857 Phase II SNPs.</p>
            <p>We selected only those SNPs that were biallelic, mapped in the non-repetitive sequences, and at least 20 nucleotides long at each side of flanking sequences. A SNP was assigned in the non-repetitive sequences if the 20 nucleotides at each side of the SNP did not overlap any repeats. A total of 5,200,425 (human), 1,470,501 (chimpanzee), 2,690,084 (dog), 7,832,159 (mouse Build 126), 376,146 (mouse Build 123), 861,498 (HapMap phase I), and 2,435,362 (HapMap phase II) SNPs were extracted after this data process (Table <tblr tid="T1">1</tblr>). We used these SNPs for SNPKS analysis in this study and named them test SNPs.</p>
            <p>We next identified SNPs in human genomic regions using the procedures described in Jiang and Zhao <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. First, the SNPs in human intergenic, genic, and intronic regions were identified by comparing the SNP locations in the assembled genomic sequences with the coordinates of intergenic regions, genes, and introns. We retrieved the coordinates of each genomic region (e.g., intron) from the ENSEMBL database (version 32.35e, released in July 2005) <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. We only included the known genic, intronic, and intergenic regions and excluded any genomic region that is predicted or possibly overlapped with another genomic region (e.g., alternative transcripts). We also excluded those SNPs that were not uniquely mapped in the human genome. We identified 2,422,730, 744,987, and 889,956 SNPs in the known intergenic, genic, and intronic regions, respectively (Table <tblr tid="T1">1</tblr>). Second, we identified SNPs in the CpG islands. CpG islands were identified using the CpG island searching program (CpGi130) <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. We used stringent search criteria for GC content &#8805; 55%, Obs<sub>CpG</sub>/Exp<sub>CpG </sub>&#8805; 0.65, and length &#8805; 500 bp to screen CpG islands in the human genome sequences. The criteria above can effectively exclude the universal Alu repeats, which typically have a sequence length of 300 bp, GC content of 53%, and Obs<sub>CpG</sub>/Exp<sub>CpG </sub>ratio of 0.62 <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>. We identified the SNPs in the CpG islands by matching the coordinates of the SNPs with those of the CpG islands in the reference sequences. Again, we excluded the SNPs that were not uniquely mapped in the human genome. A total of 95,561 SNPs were identified in the human CpG islands.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>DS participated in the data preparation and the method development, carried out computational work, and helped to draft the manuscript. CJ participated in the data preparation and analysis. ZZ conceived of the study, participated in its design, method development, and coordination, and helped to draft the manuscript. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This project was supported by the Thomas F. and Kate Miller Jeffress Memorial Trust Fund.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>A vision for the future of genomics research</p>
            </title>
            <aug>
               <au>
                  <snm>Collins</snm>
                  <fnm>FS</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>ED</fnm>
               </au>
               <au>
                  <snm>Guttmacher</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Guyer</snm>
                  <fnm>MS</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2003</pubdate>
            <volume>422</volume>
            <fpage>835</fpage>
            <lpage>847</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature01626</pubid>
                  <pubid idtype="pmpid" link="fulltext">12695777</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>A haplotype map of the human genome</p>
            </title>
            <aug>
               <au>
                  <cnm>The International HapMap Consortium</cnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2005</pubdate>
            <volume>437</volume>
            <fpage>1299</fpage>
            <lpage>1320</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature04226</pubid>
                  <pubid idtype="pmpid" link="fulltext">16255080</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Initial sequencing and comparative analysis of the mouse genome</p>
            </title>
            <aug>
               <au>
                  <snm>Waterston</snm>
                  <fnm>RH</fnm>
               </au>
               <au>
                  <snm>Lindblad-Toh</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Rogers</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Abril</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Agarwal</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Agarwala</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ainscough</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Alexandersson</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>An</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Antonarakis</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Attwood</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Baertsch</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Bailey</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Barlow</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Beck</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Berry</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Birren</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Bloom</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Botcherby</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bray</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Brent</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Bult</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Burton</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Butler</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Campbell</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Carninci</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Cawley</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Chiaromonte</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Chinwalla</snm>
                  <fnm>AT</fnm>
               </au>
               <au>
                  <snm>Church</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Clamp</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Clee</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Collins</snm>
                  <fnm>FS</fnm>
               </au>
               <au>
                  <snm>Cook</snm>
                  <fnm>LL</fnm>
               </au>
               <au>
                  <snm>Copley</snm>
                  <fnm>RR</fnm>
               </au>
               <au>
                  <snm>Coulson</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Couronne</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Cuff</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Curwen</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Cutts</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Daly</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>David</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Davies</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Delehaunty</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Deri</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Dermitzakis</snm>
                  <fnm>ET</fnm>
               </au>
               <au>
                  <snm>Dewey</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Dickens</snm>
                  <fnm>NJ</fnm>
               </au>
               <au>
                  <snm>Diekhans</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Dodge</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Dubchak</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Dunn</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Elnitski</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Emes</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Eswara</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Eyras</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Felsenfeld</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Fewell</snm>
                  <fnm>GA</fnm>
               </au>
               <au>
                  <snm>Flicek</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Foley</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Frankel</snm>
                  <fnm>WN</fnm>
               </au>
               <au>
                  <snm>Fulton</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Fulton</snm>
                  <fnm>RS</fnm>
               </au>
               <au>
                  <snm>Furey</snm>
                  <fnm>TS</fnm>
               </au>
               <au>
                  <snm>Gage</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Gibbs</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Glusman</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Gnerre</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Goldman</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Goodstadt</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Grafham</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Graves</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>ED</fnm>
               </au>
               <au>
                  <snm>Gregory</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Guigo</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Guyer</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hardison</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Hayashizaki</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Hillier</snm>
                  <fnm>LW</fnm>
               </au>
               <au>
                  <snm>Hinrichs</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hlavina</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Holzer</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Hsu</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Hua</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hubbard</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Hunt</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Jackson</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Jaffe</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Johnson</snm>
                  <fnm>LS</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Joy</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kamal</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Karlsson</snm>
                  <fnm>EK</fnm>
               </au>
               <au>
                  <snm>Karolchik</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kasprzyk</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kawai</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Keibler</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Kells</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Kirby</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kolbe</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Korf</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Kucherlapati</snm>
                  <fnm>RS</fnm>
               </au>
               <au>
                  <snm>Kulbokas</snm>
                  <fnm>EJ</fnm>
               </au>
               <au>
                  <snm>Kulp</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Landers</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Leger</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Leonard</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Letunic</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Levine</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lloyd</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lucas</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ma</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Maglott</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>Mardis</snm>
                  <fnm>ER</fnm>
               </au>
               <au>
                  <snm>Matthews</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Mauceli</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Mayer</snm>
                  <fnm>JH</fnm>
               </au>
               <au>
                  <snm>McCarthy</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>McCombie</snm>
                  <fnm>WR</fnm>
               </au>
               <au>
                  <snm>McLaren</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>McLay</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>McPherson</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Meldrim</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Meredith</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Mesirov</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Miner</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Mongin</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Montgomery</snm>
                  <fnm>KT</fnm>
               </au>
               <au>
                  <snm>Morgan</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Mott</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Mullikin</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Muzny</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Nash</snm>
                  <fnm>WE</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>JO</fnm>
               </au>
               <au>
                  <snm>Nhan</snm>
                  <fnm>MN</fnm>
               </au>
               <au>
                  <snm>Nicol</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ning</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Nusbaum</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>O'Connor</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Okazaki</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Oliver</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Overton-Larty</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Pachter</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Parra</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Pepin</snm>
                  <fnm>KH</fnm>
               </au>
               <au>
                  <snm>Peterson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Pevzner</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Plumb</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Pohl</snm>
                  <fnm>CS</fnm>
               </au>
               <au>
                  <snm>Poliakov</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ponce</snm>
                  <fnm>TC</fnm>
               </au>
               <au>
                  <snm>Ponting</snm>
                  <fnm>CP</fnm>
               </au>
               <au>
                  <snm>Potter</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Quail</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Reymond</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Roe</snm>
                  <fnm>BA</fnm>
               </au>
               <au>
                  <snm>Roskin</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Rust</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Santos</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Sapojnikov</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Schultz</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Schultz</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Schwartz</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Schwartz</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Scott</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Seaman</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Searle</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sharpe</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Sheridan</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Shownkeen</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Sims</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Singer</snm>
                  <fnm>JB</fnm>
               </au>
               <au>
                  <snm>Slater</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Smit</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>Spencer</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Stabenau</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Stange-Thomann</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Sugnet</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Suyama</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Tesler</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Thompson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Torrents</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Trevaskis</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Tromp</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ucla</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ureta-Vidal</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Vinson</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Von Niederhausern</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Wade</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Wall</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Weber</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Weiss</snm>
                  <fnm>RB</fnm>
               </au>
               <au>
                  <snm>Wendl</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>West</snm>
                  <fnm>AP</fnm>
               </au>
               <au>
                  <snm>Wetterstrand</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Wheeler</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Whelan</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wierzbowski</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Willey</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Williams</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wilson</snm>
                  <fnm>RK</fnm>
               </au>
               <au>
                  <snm>Winter</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Worley</snm>
                  <fnm>KC</fnm>
               </au>
               <au>
                  <snm>Wyman</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>SP</fnm>
               </au>
               <au>
                  <snm>Zdobnov</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Zody</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>420</volume>
            <fpage>520</fpage>
            <lpage>562</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature01262</pubid>
                  <pubid idtype="pmpid" link="fulltext">12466850</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Neighboring-nucleotide effects on the rates of germ-line single-base-pair substitution in human genes</p>
            </title>
            <aug>
               <au>
                  <snm>Krawczak</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ball</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Cooper</snm>
                  <fnm>DN</fnm>
               </au>
            </aug>
            <source>Am J Hum Genet</source>
            <pubdate>1998</pubdate>
            <volume>63</volume>
            <fpage>474</fpage>
            <lpage>488</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1377306</pubid>
                  <pubid idtype="pmpid" link="fulltext">9683596</pubid>
                  <pubid idtype="doi">10.1086/301965</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>NCBI dbSNP database</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov/SNP/</url>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Neighboring-nucleotide effects on single nucleotide polymorphisms: a study of 2.6 million polymorphisms across the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Zhao</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Boerwinkle</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>1679</fpage>
            <lpage>1686</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">187558</pubid>
                  <pubid idtype="pmpid" link="fulltext">12421754</pubid>
                  <pubid idtype="doi">10.1101/gr.287302</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>The influence of neighboring-nucleotide composition on single nucleotide polymorphisms (SNPs) in the mouse genome and its comparison with human SNPs</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Zhao</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Genomics</source>
            <pubdate>2004</pubdate>
            <volume>84</volume>
            <fpage>785</fpage>
            <lpage>795</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.ygeno.2004.06.015</pubid>
                  <pubid idtype="pmpid" link="fulltext">15475257</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>SNPNB: analyzing neighboring-nucleotide biases on single nucleotide polymorphisms (SNPs)</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Zhao</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>2517</fpage>
            <lpage>2519</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti377</pubid>
                  <pubid idtype="pmpid" link="fulltext">15769840</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Widespread purifying selection at polymorphic sites in human protein-coding loci</p>
            </title>
            <aug>
               <au>
                  <snm>Hughes</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Packer</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Welch</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Bergen</snm>
                  <fnm>AW</fnm>
               </au>
               <au>
                  <snm>Chanock</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Yeager</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2003</pubdate>
            <volume>100</volume>
            <fpage>15754</fpage>
            <lpage>15757</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">307640</pubid>
                  <pubid idtype="pmpid" link="fulltext">14660790</pubid>
                  <pubid idtype="doi">10.1073/pnas.2536718100</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Investigating single nucleotide polymorphism (SNP) density in the human genome and its implications for molecular evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Zhao</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Fu</snm>
                  <fnm>YX</fnm>
               </au>
               <au>
                  <snm>Hewett-Emmett</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Boerwinkle</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>2003</pubdate>
            <volume>312</volume>
            <fpage>207</fpage>
            <lpage>213</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0378-1119(03)00670-X</pubid>
                  <pubid idtype="pmpid" link="fulltext">12909357</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>On the use of the Kolmogorov-Smirnov statistical test for immunofluorescence histogram comparison</p>
            </title>
            <aug>
               <au>
                  <snm>Lampariello</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Cytometry</source>
            <pubdate>2000</pubdate>
            <volume>39</volume>
            <fpage>179</fpage>
            <lpage>188</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/(SICI)1097-0320(20000301)39:3&lt;179::AID-CYTO2>3.0.CO;2-I</pubid>
                  <pubid idtype="pmpid" link="fulltext">10685074</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>SNPKS</p>
            </title>
            <url>http://bioinfo.vipbg.vcu.edu/SNPKS/</url>
         </bibl>
         <bibl id="B13">
            <title>
               <p>The expected equilibrium of the CpG dinucleotide in vertebrate genomes under a mutation model</p>
            </title>
            <aug>
               <au>
                  <snm>Sved</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Bird</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1990</pubdate>
            <volume>87</volume>
            <fpage>4692</fpage>
            <lpage>4696</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">54183</pubid>
                  <pubid idtype="pmpid" link="fulltext">2352943</pubid>
                  <pubid idtype="doi">10.1073/pnas.87.12.4692</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>DNA methylation and the frequency of CpG in animal DNA</p>
            </title>
            <aug>
               <au>
                  <snm>Bird</snm>
                  <fnm>AP</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1980</pubdate>
            <volume>8</volume>
            <fpage>1499</fpage>
            <lpage>1504</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">324012</pubid>
                  <pubid idtype="pmpid" link="fulltext">6253938</pubid>
                  <pubid idtype="doi">10.1093/nar/8.7.1499</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Sequence context analysis of 8.2 million single nucleotide polymorphisms in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Zhao</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>2006</pubdate>
            <volume>366</volume>
            <fpage>316</fpage>
            <lpage>324</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.gene.2005.08.024</pubid>
                  <pubid idtype="pmpid" link="fulltext">16314054</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Worldwide DNA sequence variation in a 10-kilobase noncoding region on human chromosome 22</p>
            </title>
            <aug>
               <au>
                  <snm>Zhao</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Jin</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Fu</snm>
                  <fnm>YX</fnm>
               </au>
               <au>
                  <snm>Ramsay</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Jenkins</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Leskinen</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Pamilo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Trexler</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Patthy</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Jorde</snm>
                  <fnm>LB</fnm>
               </au>
               <au>
                  <snm>Ramos-Onsins</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>WH</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2000</pubdate>
            <volume>97</volume>
            <fpage>11354</fpage>
            <lpage>11358</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">17204</pubid>
                  <pubid idtype="pmpid" link="fulltext">11005839</pubid>
                  <pubid idtype="doi">10.1073/pnas.200348197</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Calculating the SNP-effective sample size from an alignment</p>
            </title>
            <aug>
               <au>
                  <snm>Haubold</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Wiehe</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <fpage>36</fpage>
            <lpage>38</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/18.1.36</pubid>
                  <pubid idtype="pmpid" link="fulltext">11836209</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>The mosaic structure of variation in the laboratory mouse genome</p>
            </title>
            <aug>
               <au>
                  <snm>Wade</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Kulbokas</snm>
                  <fnm>EJ</fnm>
                  <suf>3rd</suf>
               </au>
               <au>
                  <snm>Kirby</snm>
                  <fnm>AW</fnm>
               </au>
               <au>
                  <snm>Zody</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Mullikin</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Lindblad-Toh</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Daly</snm>
                  <fnm>MJ</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>420</volume>
            <fpage>574</fpage>
            <lpage>578</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature01252</pubid>
                  <pubid idtype="pmpid" link="fulltext">12466852</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Regularities of context-dependent codon bias in eukaryotic genes</p>
            </title>
            <aug>
               <au>
                  <snm>Fedorov</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Saxonov</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gilbert</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>1192</fpage>
            <lpage>1197</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">101244</pubid>
                  <pubid idtype="pmpid" link="fulltext">11861911</pubid>
                  <pubid idtype="doi">10.1093/nar/30.5.1192</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>The influence of neighboring base composition on substitutions in plant chloroplast coding sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Morton</snm>
                  <fnm>BR</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1997</pubdate>
            <volume>14</volume>
            <fpage>189</fpage>
            <lpage>194</lpage>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Mutational spectrum in the recent human genome inferred by single nucleotide polymorphisms</p>
            </title>
            <aug>
               <au>
                  <snm>Jiang</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Zhao</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Genomics</source>
            <pubdate>2006</pubdate>
            <volume>88</volume>
            <fpage>527</fpage>
            <lpage>534</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.ygeno.2006.06.003</pubid>
                  <pubid idtype="pmpid" link="fulltext">16860534</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Patterns of nucleotide substitution in pseudogenes and functional genes</p>
            </title>
            <aug>
               <au>
                  <snm>Gojobori</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>WH</fnm>
               </au>
               <au>
                  <snm>Graur</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1982</pubdate>
            <volume>18</volume>
            <fpage>360</fpage>
            <lpage>369</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF01733904</pubid>
                  <pubid idtype="pmpid">7120431</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Nonrandomness of point mutation as reflected in nucleotide substitutions in pseudogenes and its evolutionary implications</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>WH</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>CI</fnm>
               </au>
               <au>
                  <snm>Luo</snm>
                  <fnm>CC</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1984</pubdate>
            <volume>21</volume>
            <fpage>58</fpage>
            <lpage>71</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF02100628</pubid>
                  <pubid idtype="pmpid">6442359</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Mutation pattern variation among regions of the primate genome</p>
            </title>
            <aug>
               <au>
                  <snm>Casane</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Boissinot</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Chang</snm>
                  <fnm>BH</fnm>
               </au>
               <au>
                  <snm>Shimmin</snm>
                  <fnm>LC</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1997</pubdate>
            <volume>45</volume>
            <fpage>216</fpage>
            <lpage>226</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/PL00006223</pubid>
                  <pubid idtype="pmpid" link="fulltext">9302314</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Methylation-dependent transition rates are dependent on local sequence lengths and genomic regions</p>
            </title>
            <aug>
               <au>
                  <snm>Zhao</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Jiang</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2007</pubdate>
            <volume>24</volume>
            <fpage>23</fpage>
            <lpage>25</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msl156</pubid>
                  <pubid idtype="pmpid" link="fulltext">17056644</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Directionality of point mutation and 5-methylcytosine deamination rates in the chimpanzee genome</p>
            </title>
            <aug>
               <au>
                  <snm>Jiang</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Zhao</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>316</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2164-7-316</pubid>
                  <pubid idtype="pmpid" link="fulltext">17166280</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Single Nucleotide Variation Analysis in 65 Candidate Genes for CNS Disorders in a Representative Sample of the European Population</p>
            </title>
            <aug>
               <au>
                  <snm>Freudenberg-Hua</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Freudenberg</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kluck</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Cichon</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Propping</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Nothen</snm>
                  <fnm>MM</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>2271</fpage>
            <lpage>2276</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">403700</pubid>
                  <pubid idtype="pmpid" link="fulltext">14525928</pubid>
                  <pubid idtype="doi">10.1101/gr.1299703</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Ascertainment bias in studies of human genome-wide polymorphism</p>
            </title>
            <aug>
               <au>
                  <snm>Clark</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Hubisz</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Bustamante</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Williamson</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <fpage>1496</fpage>
            <lpage>1502</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1310637</pubid>
                  <pubid idtype="pmpid" link="fulltext">16251459</pubid>
                  <pubid idtype="doi">10.1101/gr.4107905</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics</p>
            </title>
            <aug>
               <au>
                  <snm>Yekutieli</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Benjamini</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>J Stat Planning Inference</source>
            <pubdate>1999</pubdate>
            <volume>82</volume>
            <fpage>171</fpage>
            <lpage>196</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/S0378-3758(99)00041-5</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>International HapMap Project</p>
            </title>
            <url>http://www.hapmap.org/</url>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Ensembl</p>
            </title>
            <url>ftp://ftp.ensembl.org/pub/</url>
         </bibl>
         <bibl id="B32">
            <title>
               <p>The CpG island searcher: a new WWW resource</p>
            </title>
            <aug>
               <au>
                  <snm>Takai</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>PA</fnm>
               </au>
            </aug>
            <source>In Silico Biol</source>
            <pubdate>2003</pubdate>
            <volume>3</volume>
            <fpage>235</fpage>
            <lpage>240</lpage>
            <xrefbib>
               <pubid idtype="pmpid">12954087</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Determinants of CpG islands: expression in early embryo and isochore structure</p>
            </title>
            <aug>
               <au>
                  <snm>Ponger</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Duret</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Mouchiroud</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <fpage>1854</fpage>
            <lpage>1860</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">311164</pubid>
                  <pubid idtype="pmpid" link="fulltext">11691850</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Comprehensive analysis of CpG islands in human chromosomes 21 and 22</p>
            </title>
            <aug>
               <au>
                  <snm>Takai</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>PA</fnm>
               </au>
            </aug>
