<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-9-239</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Aspects of coverage in medical DNA sequencing</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Wendl</snm>
               <mi>C</mi>
               <fnm>Michael</fnm>
               <insr iid="I1"/>
               <email>mwendl@wustl.edu</email>
            </au>
            <au id="A2">
               <snm>Wilson</snm>
               <mi>K</mi>
               <fnm>Richard</fnm>
               <insr iid="I1"/>
               <email>rwilson@wustl.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Genome Sequencing Center and Department of Genetics, Washington University, St. Louis MO 63108, USA</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2008</pubdate>
         <volume>9</volume>
         <issue>1</issue>
         <fpage>239</fpage>
         <url>http://www.biomedcentral.com/1471-2105/9/239</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18485222</pubid>
               <pubid idtype="doi">10.1186/1471-2105-9-239</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>08</day>
               <month>10</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>16</day>
               <month>5</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>16</day>
               <month>5</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Wendl and Wilson; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>DNA sequencing is now emerging as an important component in biomedical studies of diseases like cancer. Short-read, highly parallel sequencing instruments are expected to be used heavily for such projects, but many design specifications have yet to be conclusively established. Perhaps the most fundamental of these is the redundancy required to detect sequence variations, which bears directly upon genomic coverage and the consequent resolving power for discerning somatic mutations.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We address the medical sequencing coverage problem via an extension of the standard mathematical theory of haploid coverage. The expected diploid multi-fold coverage, as well as its generalization for aneuploidy are derived and these expressions can be readily evaluated for any project. The resulting theory is used as a scaling law to calibrate performance to that of standard BAC sequencing at 8&#215; to 10&#215; redundancy, i.e. for expected coverages that exceed 99% of the unique sequence. A differential strategy is formalized for tumor/normal studies wherein tumor samples are sequenced more deeply than normal ones. In particular, both tumor alleles should be detected at least twice, while both normal alleles are detected at least once. Our theory predicts these requirements can be met for tumor and normal redundancies of approximately 26&#215; and 21&#215;, respectively. We explain why these values do not differ by a factor of 2, as might intuitively be expected. Future technology developments should prompt even deeper sequencing of tumors, but the 21&#215; value for normal samples is essentially a constant.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Given the assumptions of standard coverage theory, our model gives pragmatic estimates for required redundancy. The differential strategy should be an efficient means of identifying potential somatic mutations for further study.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Applications of DNA sequencing to medically significant problems continue to grow <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>. In particular, recent technological trends suggest that the sequencing of entire cohorts of individual patient genomes will soon be economically feasible <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>. This contrasts dramatically with the enormous resources that were expended on deciphering just a single composite human reference genome only a few years ago <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. Sequence-based characterization promises to play an expanding role in medicine because of its ability to identify potential disease-causing mutations <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. It will be especially important in cancers, for example, for distinguishing between sequence variations in the germline versus somatic mutations that are relevant to tumor initiation or growth <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>.</p>
         <p>In principle, the process-engineering issues in both gene-based and whole-genome medical sequencing are identical to those for <it>de novo </it>genomic sequencing, that is, to "cover" a region of interest with shotgun read data. However, the definitions of what constitutes coverage are rather different. In traditional genomic sequencing, the target is a haploid genome and coverage of a base position <it>x </it>is defined as the event whereby one or more sequence reads span <it>x</it>. Such a process is binomial and, according to elementary probability theory, the expected fractional coverage is 1-exp(-<it>&#961;</it>), where <it>&#961; </it>= <it>NL/G</it>. Here, <it>L </it>and <it>G </it>are the read and haploid genome lengths, respectively, <it>N </it>is the number of reads sequenced, and <it>&#961; </it>is the haploid redundancy. Although this result describes a number of traditional coverage configurations <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, it seems to be known to the sequencing community primarily via its application by Clarke and Carbon <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. This expression is also sometimes attributed to Lander-Waterman Theory (LWT) <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, although LWT actually treats the issue of sequence gaps rather than coverage.</p>
         <p>Medical sequencing projects focus on genetic variation and seek to identify both alleles at <it>x </it>for the diploid genome. In particular, diploid sequence is necessary for discerning heterozygous mutations. Consequently, coverage is thought of in a more general way. Here, we say that <it>x </it>is "covered" when each allele is spanned by at least <it>&#966; </it>reads, where <it>&#966; </it>&#8805; 1. Actual values of <it>&#966; </it>will depend upon study-specific considerations that weigh economic factors against such things as desired confidence levels for detection and confirmation, anticipated data quality, etc. Some results on multiple coverings appear in the mathematical literature <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>, but these do not address the problem beyond the haploid level. Smith and Bernstein <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> conducted early numerical simulations for <it>&#966; </it>= 1 on a 20 kb fragment, but evidently did not extend the approach to genome-size targets. Levy et al. <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> and Wheeler et al. <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> also describe models for this problem, which we discuss further below.</p>
         <p>An important issue for future medical sequencing projects can be posed as follows. Given a specific choice of <it>&#966;</it>, estimate the necessary redundancy such that either the probability of covering a given position, e.g. a SNP, has some desired value, or that the expectation for the number of captured positions has such a value. These propositions are actually identical (see Methods). However, additional study-specific issues also arise. For example, for tumor/germline pairs, one has to specify <it>&#961; </it>for both types of samples. As we demonstrate below, the two values should not necessarily be the same.</p>
         <p>Speculation regarding these issues has been around for some time. For example, Strausberg et al. <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> observed that <it>&#961; </it>should exceed 10. In other words, redundancies for medical sequencing projects should surpass those values conventionally associated with haploid whole-genome shotgun projects, BAC projects, etc. This is largely intuitive, given the diploid nature of the problem, but not particularly informative. Pioneering diploid sequencing projects furnish some early anecdotal information. For example, Levy et al. <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> considered <it>&#961; </it>= 20 to be adequate for germline sequencing of a healthy individual based upon simulation, certain heuristic filters, and the model alluded to above. They employed traditional Sanger sequencing <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> and reached only about 7.5&#215;, so the degree to which this value generalizes to medical sequencing of cancer genomes using short-read "next-generation" platforms <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> is unclear. Likewise, Wheeler et al. <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> report only about 7.4&#215; for another diploid project.</p>
         <p>Here, we address medical sequencing coverage more formally by way of a straightforward mathematical extension to the standard covering process model. We consider this an idealization in the sense that it presumes all entities are independently and identically distributed (IID) and neglects any heuristic inputs. However, we also demonstrate the use of empirical data to calibrate response such that inferences can be drawn for medical sequencing projects. The resulting analysis points to what we believe will be an efficient means of discerning potential somatic mutations and enables estimation of the necessary parameters.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>Given a location <it>x </it>defined in the context of <it>h </it>associated chromosomes, let <it>P</it><sub><it>h</it>, <it>&#966; </it></sub>be the probability that <it>x </it>is covered at least <it>&#966; </it>times. The immediate focus of much of the research community is on diploid sequencing of homologous chromosomes (<it>h </it>= 2) related to the cancers, for which we report a mathematical theory of coverage. In anticipation of extending sequencing to aneuploid configurations, some of which are also relevant to cancer, we furnish the general result for <it>h </it>> 2, as well.</p>
         <sec>
            <st>
               <p>Diploid Sequencing Theory</p>
            </st>
            <p>Given a diploid genome (<it>h </it>= 2), the probability of coverage is</p>
            <p>
               <display-formula id="M1">
                  <m:math name="1471-2105-9-239-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>P</m:mi>
                              <m:mrow>
                                 <m:mn>2</m:mn>
                                 <m:mo>,</m:mo>
                                 <m:mi>&#966;</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munderover>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>j</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mi>&#966;</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mi>N</m:mi>
                                    <m:mo>&#8722;</m:mo>
                                    <m:mi>&#966;</m:mi>
                                 </m:mrow>
                              </m:munderover>
                              <m:mrow>
                                 <m:msub>
                                    <m:mi>C</m:mi>
                                    <m:mrow>
                                       <m:mi>N</m:mi>
                                       <m:mo>,</m:mo>
                                       <m:mi>j</m:mi>
                                    </m:mrow>
                                 </m:msub>
                                 <m:msubsup>
                                    <m:mi>&#948;</m:mi>
                                    <m:mn>2</m:mn>
                                    <m:mi>j</m:mi>
                                 </m:msubsup>
                                 <m:msup>
                                    <m:mrow>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mn>1</m:mn>
                                       <m:mo>&#8722;</m:mo>
                                       <m:msub>
                                          <m:mi>&#948;</m:mi>
                                          <m:mn>2</m:mn>
                                       </m:msub>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                    <m:mrow>
                                       <m:mi>N</m:mi>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mi>j</m:mi>
                                    </m:mrow>
                                 </m:msup>
                              </m:mrow>
                           </m:mstyle>
                           <m:mrow>
                              <m:mo>[</m:mo>
                              <m:mrow>
                                 <m:mn>1</m:mn>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mstyle displaystyle="true">
                                    <m:munderover>
                                       <m:mo>&#8721;</m:mo>
                                       <m:mrow>
                                          <m:mi>k</m:mi>
                                          <m:mo>=</m:mo>
                                          <m:mn>0</m:mn>
                                       </m:mrow>
                                       <m:mrow>
                                          <m:mi>&#966;</m:mi>
                                          <m:mo>&#8722;</m:mo>
                                          <m:mn>1</m:mn>
                                       </m:mrow>
                                    </m:munderover>
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>C</m:mi>
                                          <m:mrow>
                                             <m:mi>N</m:mi>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mi>j</m:mi>
                                             <m:mo>,</m:mo>
                                             <m:mi>k</m:mi>
                                          </m:mrow>
                                       </m:msub>
                                       <m:msubsup>
                                          <m:mi>&#948;</m:mi>
                                          <m:mn>2</m:mn>
                                          <m:mi>k</m:mi>
                                       </m:msubsup>
                                       <m:msup>
                                          <m:mrow>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mn>1</m:mn>
                                             <m:mo>&#8722;</m:mo>
                                             <m:msub>
                                                <m:mi>&#948;</m:mi>
                                                <m:mn>2</m:mn>
                                             </m:msub>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                          <m:mrow>
                                             <m:mi>N</m:mi>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mi>j</m:mi>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mi>k</m:mi>
                                          </m:mrow>
                                       </m:msup>
                                    </m:mrow>
                                 </m:mstyle>
                              </m:mrow>
                              <m:mo>]</m:mo>
                           </m:mrow>
                           <m:mo>,</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiuaa1aaSbaaSqaaiabikdaYiabcYcaSiabeA8aMbqabaGccqGH9aqpdaaeWbqaaiabdoeadnaaBaaaleaacqWGobGtcqGGSaalcqWGQbGAaeqaaOGaeqiTdq2aa0baaSqaaiabikdaYaqaaiabdQgaQbaakiabcIcaOiabigdaXiabgkHiTiabes7aKnaaBaaaleaacqaIYaGmaeqaaOGaeiykaKYaaWbaaSqabeaacqWGobGtcqGHsislcqWGQbGAaaaabaGaemOAaOMaeyypa0JaeqOXdygabaGaemOta4KaeyOeI0IaeqOXdyganiabggHiLdGcdaWadaqaaiabigdaXiabgkHiTmaaqahabaGaem4qam0aaSbaaSqaaiabd6eaojabgkHiTiabdQgaQjabcYcaSiabdUgaRbqabaGccqaH0oazdaqhaaWcbaGaeGOmaidabaGaem4AaSgaaOGaeiikaGIaeGymaeJaeyOeI0IaeqiTdq2aaSbaaSqaaiabikdaYaqabaGccqGGPaqkdaahaaWcbeqaaiabd6eaojabgkHiTiabdQgaQjabgkHiTiabdUgaRbaaaeaacqWGRbWAcqGH9aqpcqaIWaamaeaacqaHgpGzcqGHsislcqaIXaqma0GaeyyeIuoaaOGaay5waiaaw2faaiabcYcaSaaa@7442@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <it>&#948;</it><sub>2 </sub>= <it>L</it>/(2<it>G</it>) is the diploid Bernoulli probability and <it>C</it><sub><it>N</it>, <it>k </it></sub>are the binomial coefficients. Eq. 1 also gives the expected fraction of a set of locations that are covered (Methods). This equation relies on the standard IID assumption, but is exact in the sense that it accounts for the fact that the coverings of two corresponding alleles on homologous chromosomes are not strictly independent of one another (Methods). However, parameters in an actual project are such that alleles are <it>almost </it>independent. Moreover, asymptotic approximation can be applied (Methods), in which case</p>
            <p>
               <display-formula id="M2">
                  <m:math name="1471-2105-9-239-i2" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>P</m:mi>
                              <m:mrow>
                                 <m:mn>2</m:mn>
                                 <m:mo>,</m:mo>
                                 <m:mi>&#966;</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mo>&#8776;</m:mo>
                           <m:msup>
                              <m:mrow>
                                 <m:mrow>
                                    <m:mo>(</m:mo>
                                    <m:mrow>
                                       <m:mn>1</m:mn>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mstyle displaystyle="true">
                                          <m:munderover>
                                             <m:mo>&#8721;</m:mo>
                                             <m:mrow>
                                                <m:mi>k</m:mi>
                                                <m:mo>=</m:mo>
                                                <m:mn>0</m:mn>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mi>&#966;</m:mi>
                                                <m:mo>&#8722;</m:mo>
                                                <m:mn>1</m:mn>
                                             </m:mrow>
                                          </m:munderover>
                                          <m:mrow>
                                             <m:mfrac>
                                                <m:mn>1</m:mn>
                                                <m:mrow>
                                                   <m:mi>k</m:mi>
                                                   <m:mo>!</m:mo>
                                                </m:mrow>
                                             </m:mfrac>
                                             <m:msup>
                                                <m:mrow>
                                                   <m:mrow>
                                                      <m:mo>(</m:mo>
                                                      <m:mrow>
                                                         <m:mfrac>
                                                            <m:mi>&#961;</m:mi>
                                                            <m:mn>2</m:mn>
                                                         </m:mfrac>
                                                      </m:mrow>
                                                      <m:mo>)</m:mo>
                                                   </m:mrow>
                                                </m:mrow>
                                                <m:mi>k</m:mi>
                                             </m:msup>
                                             <m:msup>
                                                <m:mi>e</m:mi>
                                                <m:mrow>
                                                   <m:mo>&#8722;</m:mo>
                                                   <m:mi>&#961;</m:mi>
                                                   <m:mo>/</m:mo>
                                                   <m:mn>2</m:mn>
                                                </m:mrow>
                                             </m:msup>
                                          </m:mrow>
                                       </m:mstyle>
                                    </m:mrow>
                                    <m:mo>)</m:mo>
                                 </m:mrow>
                              </m:mrow>
                              <m:mn>2</m:mn>
                           </m:msup>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiuaa1aaSbaaSqaaiabikdaYiabcYcaSiabeA8aMbqabaGccqGHijYUdaqadaqaaiabigdaXiabgkHiTmaaqahabaqcfa4aaSaaaeaacqaIXaqmaeaacqWGRbWAcqGGHaqiaaGcdaqadaqcfayaamaalaaabaGaeqyWdihabaGaeGOmaidaaaGccaGLOaGaayzkaaWaaWbaaSqabeaacqWGRbWAaaGccqWGLbqzdaahaaWcbeqaaiabgkHiTiabeg8aYjabc+caViabikdaYaaaaeaacqWGRbWAcqGH9aqpcqaIWaamaeaacqaHgpGzcqGHsislcqaIXaqma0GaeyyeIuoaaOGaayjkaiaawMcaamaaCaaaleqabaGaeGOmaidaaaaa@50AC@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>is a very good approximation of Eq. 1. Here, e is the Euler Number (&#8776; 2.71828) and <it>&#961; </it>is again the conventional haploid redundancy. Note the basis in a Poisson distribution having a rate <it>&#961;</it>/2. Eq. 2 is straightforward to evaluate for any project because <it>&#966; </it>is typically not very large. This stands in contrast to Eq. 1, which sports an enormous number of terms, as well as tendencies for numerical overflow and underflow of its various components. For convenience, we expand the first three expressions</p>
            <p>
               <display-formula id="M3">
                  <m:math name="1471-2105-9-239-i3" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>P</m:mi>
                              <m:mrow>
                                 <m:mn>2</m:mn>
                                 <m:mo>,</m:mo>
                                 <m:mn>1</m:mn>
                              </m:mrow>
                           </m:msub>
                           <m:mo>&#8776;</m:mo>
                           <m:msup>
                              <m:mrow>
                                 <m:mrow>
                                    <m:mo>(</m:mo>
                                    <m:mrow>
                                       <m:mn>1</m:mn>
                                       <m:mo>&#8722;</m:mo>
                                       <m:msup>
                                          <m:mi>e</m:mi>
                                          <m:mrow>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mi>&#961;</m:mi>
                                             <m:mo>/</m:mo>
                                             <m:mn>2</m:mn>
                                          </m:mrow>
                                       </m:msup>
                                    </m:mrow>
                                    <m:mo>)</m:mo>
                                 </m:mrow>
                              </m:mrow>
                              <m:mn>2</m:mn>
                           </m:msup>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiuaa1aaSbaaSqaaiabikdaYiabcYcaSiabigdaXaqabaGccqGHijYUdaqadaqaaiabigdaXiabgkHiTiabdwgaLnaaCaaaleqabaGaeyOeI0IaeqyWdiNaei4la8IaeGOmaidaaaGccaGLOaGaayzkaaWaaWbaaSqabeaacqaIYaGmaaaaaa@3C89@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>
               <display-formula id="M4">
                  <m:math name="1471-2105-9-239-i4" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>P</m:mi>
                              <m:mrow>
                                 <m:mn>2</m:mn>
                                 <m:mo>,</m:mo>
                                 <m:mn>2</m:mn>
                              </m:mrow>
                           </m:msub>
                           <m:mo>&#8776;</m:mo>
                           <m:msup>
                              <m:mrow>
                                 <m:mrow>
                                    <m:mo>(</m:mo>
                                    <m:mrow>
                                       <m:mn>1</m:mn>
                                       <m:mo>&#8722;</m:mo>
                                       <m:msup>
                                          <m:mi>e</m:mi>
                                          <m:mrow>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mi>&#961;</m:mi>
                                             <m:mo>/</m:mo>
                                             <m:mn>2</m:mn>
                                          </m:mrow>
                                       </m:msup>
                                       <m:mrow>
                                          <m:mo>(</m:mo>
                                          <m:mrow>
                                             <m:mn>1</m:mn>
                                             <m:mo>+</m:mo>
                                             <m:mfrac>
                                                <m:mi>&#961;</m:mi>
                                                <m:mn>2</m:mn>
                                             </m:mfrac>
                                          </m:mrow>
                                          <m:mo>)</m:mo>
                                       </m:mrow>
                                    </m:mrow>
                                    <m:mo>)</m:mo>
                                 </m:mrow>
                              </m:mrow>
                              <m:mn>2</m:mn>
                           </m:msup>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiuaa1aaSbaaSqaaiabikdaYiabcYcaSiabikdaYaqabaGccqGHijYUdaqadaqaaiabigdaXiabgkHiTiabdwgaLnaaCaaaleqabaGaeyOeI0IaeqyWdiNaei4la8IaeGOmaidaaOWaaeWaaeaacqaIXaqmcqGHRaWkdaWcaaqaaiabeg8aYbqaaiabikdaYaaaaiaawIcacaGLPaaaaiaawIcacaGLPaaadaahaaWcbeqaaiabikdaYaaaaaa@42A8@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>
               <display-formula id="M5">
                  <m:math name="1471-2105-9-239-i5" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>P</m:mi>
                              <m:mrow>
                                 <m:mn>2</m:mn>
                                 <m:mo>,</m:mo>
                                 <m:mn>3</m:mn>
                              </m:mrow>
                           </m:msub>
                           <m:mo>&#8776;</m:mo>
                           <m:msup>
                              <m:mrow>
                                 <m:mrow>
                                    <m:mo>(</m:mo>
                                    <m:mrow>
                                       <m:mn>1</m:mn>
                                       <m:mo>&#8722;</m:mo>
                                       <m:msup>
                                          <m:mi>e</m:mi>
                                          <m:mrow>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mi>&#961;</m:mi>
                                             <m:mo>/</m:mo>
                                             <m:mn>2</m:mn>
                                          </m:mrow>
                                       </m:msup>
                                       <m:mrow>
                                          <m:mo>(</m:mo>
                                          <m:mrow>
                                             <m:mn>1</m:mn>
                                             <m:mo>+</m:mo>
                                             <m:mfrac>
                                                <m:mi>&#961;</m:mi>
                                                <m:mn>2</m:mn>
                                             </m:mfrac>
                                             <m:mo>+</m:mo>
                                             <m:mfrac>
                                                <m:mrow>
                                                   <m:msup>
                                                      <m:mi>&#961;</m:mi>
                                                      <m:mn>2</m:mn>
                                                   </m:msup>
                                                </m:mrow>
                                                <m:mn>8</m:mn>
                                             </m:mfrac>
                                          </m:mrow>
                                          <m:mo>)</m:mo>
                                       </m:mrow>
                                    </m:mrow>
                                    <m:mo>)</m:mo>
                                 </m:mrow>
                              </m:mrow>
                              <m:mn>2</m:mn>
                           </m:msup>
                           <m:mo>.</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiuaa1aaSbaaSqaaiabikdaYiabcYcaSiabiodaZaqabaGccqGHijYUdaqadaqaaiabigdaXiabgkHiTiabdwgaLnaaCaaaleqabaGaeyOeI0IaeqyWdiNaei4la8IaeGOmaidaaOWaaeWaaeaacqaIXaqmcqGHRaWkjuaGdaWcaaqaaiabeg8aYbqaaiabikdaYaaakiabgUcaRKqbaoaalaaabaGaeqyWdi3aaWbaaeqabaGaeGOmaidaaaqaaiabiIda4aaaaOGaayjkaiaawMcaaaGaayjkaiaawMcaamaaCaaaleqabaGaeGOmaidaaOGaeiOla4caaa@498C@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
         </sec>
         <sec>
            <st>
               <p>Generalization to Aneuploidy</p>
            </st>
            <p>Under the assumption of independence, Eq. 2 for homologous chromosomes is readily generalized to an arbitrary number of chromosomes, <it>h</it>, specifically</p>
            <p>
               <display-formula id="M6">
                  <m:math name="1471-2105-9-239-i6" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>P</m:mi>
                              <m:mrow>
                                 <m:mi>h</m:mi>
                                 <m:mo>,</m:mo>
                                 <m:mi>&#966;</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mo>&#8776;</m:mo>
                           <m:msup>
                              <m:mrow>
                                 <m:mrow>
                                    <m:mo>(</m:mo>
                                    <m:mrow>
                                       <m:mn>1</m:mn>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mstyle displaystyle="true">
                                          <m:munderover>
                                             <m:mo>&#8721;</m:mo>
                                             <m:mrow>
                                                <m:mi>k</m:mi>
                                                <m:mo>=</m:mo>
                                                <m:mn>0</m:mn>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mi>&#966;</m:mi>
                                                <m:mo>&#8722;</m:mo>
                                                <m:mn>1</m:mn>
                                             </m:mrow>
                                          </m:munderover>
                                          <m:mrow>
                                             <m:mfrac>
                                                <m:mn>1</m:mn>
                                                <m:mrow>
                                                   <m:mi>k</m:mi>
                                                   <m:mo>!</m:mo>
                                                </m:mrow>
                                             </m:mfrac>
                                             <m:msup>
                                                <m:mrow>
                                                   <m:mrow>
                                                      <m:mo>(</m:mo>
                                                      <m:mrow>
                                                         <m:mfrac>
                                                            <m:mi>&#961;</m:mi>
                                                            <m:mi>h</m:mi>
                                                         </m:mfrac>
                                                      </m:mrow>
                                                      <m:mo>)</m:mo>
                                                   </m:mrow>
                                                </m:mrow>
                                                <m:mi>k</m:mi>
                                             </m:msup>
                                             <m:msup>
                                                <m:mi>e</m:mi>
                                                <m:mrow>
                                                   <m:mo>&#8722;</m:mo>
                                                   <m:mi>&#961;</m:mi>
                                                   <m:mo>/</m:mo>
                                                   <m:mi>h</m:mi>
                                                </m:mrow>
                                             </m:msup>
                                          </m:mrow>
                                       </m:mstyle>
                                    </m:mrow>
                                    <m:mo>)</m:mo>
                                 </m:mrow>
                              </m:mrow>
                              <m:mi>h</m:mi>
                           </m:msup>
                           <m:mo>.</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiuaa1aaSbaaSqaaiabdIgaOjabcYcaSiabeA8aMbqabaGccqGHijYUdaqadaqaaiabigdaXiabgkHiTmaaqahabaqcfa4aaSaaaeaacqaIXaqmaeaacqWGRbWAcqGGHaqiaaGcdaqadaqcfayaamaalaaabaGaeqyWdihabaGaemiAaGgaaaGccaGLOaGaayzkaaWaaWbaaSqabeaacqWGRbWAaaGccqWGLbqzdaahaaWcbeqaaiabgkHiTiabeg8aYjabc+caViabdIgaObaaaeaacqWGRbWAcqGH9aqpcqaIWaamaeaacqaHgpGzcqGHsislcqaIXaqma0GaeyyeIuoaaOGaayjkaiaawMcaamaaCaaaleqabaGaemiAaGgaaOGaeiOla4caaa@5336@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>Note the Poisson basis having a rate <it>&#961;</it>/<it>h</it>, for example <it>&#961;</it>/3 for chromosomal trisomies. Like Eq. 2, this expression is readily evaluated and straightforward to expand for given values of <it>&#966;</it>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>A number of medical sequencing coverage issues are currently being debated. New questions have arisen not only because diploid medical sequencing is itself a fairly recent undertaking, but also because of the expectation that novel sequencing platforms will be heavily employed in such projects. Read lengths are substantially shorter than traditional Sanger data <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> and investigators are eager to determine how this affects coverage. We focus our discussion here primarily on the diploid problem, although some projections for aneuploid configurations are given, as well.</p>
         <sec>
            <st>
               <p>Coverage Assessment</p>
            </st>
            <p>Fig. <figr fid="F1">1</figr> shows the traditional <it>de novo </it>haploid coverage model <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp> versus diploid medical sequencing coverage theory for minimum read coverings of <it>&#966; </it>&#8712; {1, 2, 3, 4, 5} for both alleles. The diploid curves were generated by Eq. 2 for a 3.3 billion base-pair genome and 31 base-pair read lengths. Errors associated with not using Eq. 1 for these particular parameters are significantly less than 1% (data not shown). As one would intuitively expect, the required redundancies for a given coverage fraction increase with <it>&#966; </it>and are noticeably higher than established values for haploid genome sequencing. Although these cases are within the realm of feasibility for the newest-generation sequencing platforms <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, economic factors would probably still preclude the higher values of <it>&#966; </it>at the present time. Conversely, these depths are much lower than values that have been discussed elsewhere. For example, Warren et al. <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> report <it>&#961; </it>up to 100 and 400 for bacterial and viral genomes, respectively, using 25 bp fragments to simulate data from an Illumina instrument.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Traditional haploid coverage model [15, 16] versus diploid medical sequencing coverage results for minimum number of covering reads <it>&#966; </it>&#8712; {1, 2, 3, 4, 5}</p>
               </caption>
               <text>
                  <p><b>Traditional haploid coverage model [15, 16] versus diploid medical sequencing coverage results for minimum number of covering reads <it>&#966; </it>&#8712; {1, 2, 3, 4, 5}.</b> The figure also shows an additional curve that replots the diploid <it>&#966; </it>= 2 curve, except where abscissa values are scaled by one-half. This aspect is relevant to the discussion of why the redundancies for <it>&#966; </it>= 1 and <it>&#966; </it>= 2 do not differ by a factor of two. Coverage progressions for <it>&#966; </it>&#8712; {1, 2} are also shown for the recent Illumina resequencing of <it>C. elegans </it>by Hillier et al. [28]. These points represent average coverages over all chromosome pairs, while their error bars show the observed minima and maxima. Simulation data for <it>&#966; </it>= 1 on a 20 kb fragment using 250 bp reads [20] are also shown. Points and error bars represent the averages and extrema, respectively, of 250 simulations.</p>
               </text>
               <graphic file="1471-2105-9-239-1"/>
            </fig>
            <p>Two other notable trends are visible in Fig. <figr fid="F1">1</figr>. First, increasingly large redundancies are required just to obtain non-trivial values of <it>P</it><sub>2, <it>&#966;</it></sub>. For example, the curve for <it>&#966; </it>= 1 already exceeds 0.01 at <it>&#961; </it>= 0.25, whereas this mark is not met until <it>&#961; </it>= 5 at <it>&#966; </it>= 5. Indeed, one may not see even the "beginnings" of coverage until comparatively high redundancy has been reached, depending on the selected <it>&#966;</it>. Also, the amount by which each curve is drawn-out over the abscissa increases with <it>&#966;</it>, signifying a decelerating coverage rate. This is especially clear for what appear to be the linear segments of each curve; their slopes progressively decrease. Again comparing the extremes, the difference between 0.1 and 0.9 on the ordinate is a little over 5 units of redundancy for <it>&#966; </it>= 1 but almost 11 units for <it>&#966; </it>= 5. This phenomenon bears on point we make below.</p>
            <p>Both of these trends arise strictly as mathematical consequences and can perhaps best be understood by referring to Eqs. 3 through 5. The exponential (Euler Number) term represents the tendency for the coverage rate to decay. For each successive value of <it>&#966; </it>this term is bolstered by additional factors, which themselves grow progressively faster with <it>&#961;</it>, whereby the overall effect is realized.</p>
         </sec>
         <sec>
            <st>
               <p>Calibration and the Stopping Problem</p>
            </st>
            <p>One of the primary issues facing the investigator is the so-called stopping problem. That is, at what <it>&#961; </it>should random processing be halted? This question is, of course, context dependent. Yet, it can be answered, at least approximately, by using the analysis given here. For example, suppose the goal is to design a medical sequencing project such that the expected coverage progress corresponds roughly to standard BAC sequencing. This is a calibration-based way of framing the question and exploits the community's collective empirical experience gained from having sequenced hundreds of thousands of such clones. In particular, 6 &#8804; <it>&#961; </it>&#8804; 10 has been found to be a reasonable balance between cost and coverage, although values nearer to 10 are more typically chosen <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. In this capacity, Eqs. 1 and 2 effectively function as scaling laws.</p>
            <p>Scaling can conveniently be demonstrated graphically, for example by picking a point on the haploid curve for a desired redundancy, drawing a horizontal line through this point, and reading the redundancy at the intersection of the chosen diploid curve and the horizontal. The asymptotic nature of the curves depicted in Fig. <figr fid="F1">1</figr> obscures this process, but it can readily be accomplished using a magnified plot. Fig. <figr fid="F2">2</figr> shows the example of extrapolating haploid sequencing coverage at <it>&#961; </it>= 8 to diploid sequencing (<it>&#966; </it>= 1), the result being about 17.5&#215; for the diploid project. Table <tblr tid="T1">1</tblr> furnishes an expanded set of values for <it>&#966; </it>&#8712; {1, 2, 3} calibrated against haploid sequencing for <it>&#961; </it>&#8712; {6, 8, 10}. Again, the increase of redundancy with the minimum number of reads required to attain coverage is quite clear. Notice that each of the three rows in the table corresponds to covering more than 99% of the unique sequence. In other words, the covering probabilities change very little over fairly significant increases in depth. Consequently, BAC depth provides much better resolution than BAC coverage for scaling the diploid problem. This observation is also obvious from Fig. <figr fid="F2">2</figr>.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Calibration of medical sequencing according to traditional haploid expectation</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c cspan="2" ca="center">
                        <p>Traditional [16]</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>Corresponding Medical Sequencing <it>&#961;</it></p>
                     </c>
                  </r>
                  <r>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Redundancy</p>
                     </c>
                     <c ca="left">
                        <p><it>P</it>-value</p>
                     </c>
                     <c ca="left">
                        <p>1-read min.</p>
                     </c>
                     <c ca="left">
                        <p>2-read min.</p>
                     </c>
                     <c ca="left">
                        <p>3-read min.</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>6</p>
                     </c>
                     <c ca="left">
                        <p>0.99752</p>
                     </c>
                     <c ca="left">
                        <p>13.5</p>
                     </c>
                     <c ca="left">
                        <p>18</p>
                     </c>
                     <c ca="left">
                        <p>22</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>8</p>
                     </c>
                     <c ca="left">
                        <p>0.99967</p>
                     </c>
                     <c ca="left">
                        <p>17.5</p>
                     </c>
                     <c ca="left">
                        <p>22.5</p>
                     </c>
                     <c ca="left">
                        <p>26.5</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>10</p>
                     </c>
                     <c ca="left">
                        <p>0.99996</p>
                     </c>
                     <c ca="left">
                        <p>21.5</p>
                     </c>
                     <c ca="left">
                        <p>26.5</p>
                     </c>
                     <c ca="left">
                        <p>31.5</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Haploid and diploid results for expected coverage values of at least 0.9975</p>
               </caption>
               <text>
                  <p><b>Haploid and diploid results for expected coverage values of at least 0.9975.</b> This is a greatly &#8211; magnified view of the top quarter &#8211; percent of the ordinate range in Fig. 1. Vertical lines demarcate the typical BAC calibration neighborhood of 6 &#8804; <it>&#961; </it>&#8804; 10. The scaling process is demonstrated graphically for diploid sequencing (<it>&#966; </it>= 1) based on haploid sequencing at <it>&#961; </it>= 8.</p>
               </text>
               <graphic file="1471-2105-9-239-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Comparison to Haploid-Based Distribution Models</p>
            </st>
            <p>Diploid coverage, as discussed above, is a primary consideration for medical sequencing. Yet, it is also useful for comparison to examine such projects in their haploid context. The stopping problem has been extensively studied from a number of analytical perspectives for traditional <it>de novo </it>genomic sequencing projects, for example using the probability of complete coverage, <it>P</it><sub><it>C </it></sub><abbrgrp><abbr bid="B23">23</abbr></abbrgrp> and the intersection probability, <it>P</it><sub>&#8898; </sub><abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. While the meaning of the former is probably clear, the latter characterizes how effective additional redundancy will be for improving coverage in light of increasingly-important stochastic effects. Of these two metrics, <it>P</it><sub><it>C </it></sub>is the more conservative and could be thought of as setting an upper bound in the context of traditional haploid sequencing. Before proceeding, let us digress briefly to further explain <it>P</it><sub>&#8898;</sub>.</p>
            <p>Consider two hypothetical sequencing projects, <it>A </it>and <it>A</it>', that are identical in every respect, except that <it>A' </it>is always ahead by one whole unit of redundancy (Fig. <figr fid="F3">3</figr>). Now, examine these projects at two particular instances, specifically for project <it>A </it>at both <it>&#961;</it><sub>1 </sub>and <it>&#961;</it><sub>2 </sub>(with <it>A' </it>at <it>&#961;</it><sub>1 </sub>+ 1 and <it>&#961;</it><sub>2 </sub>+ 1, respectively), where <it>&#961;</it><sub>1 </sub>&lt;<it>&#961;</it><sub>2</sub>. At <it>&#961;</it><sub>1</sub>, there may be little overlap in the densities between the two projects. Accordingly, the probability is extremely high, perhaps close to 1, that <it>A' </it>will be more highly covered than <it>A</it>. The system behaves as if there is a deterministic increase in coverage for <it>A </it>&#8594; <it>A' </it>for a unit increase of redundancy. At higher redundancies, say <it>&#961;</it><sub>2</sub>, mathematical analysis indicates that the intersection of the <it>A </it>and <it>A' </it>densities will grow <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. Consequently, differences in actual coverage between <it>A </it>and <it>A' </it>become progressively more a function of chance than of differences in the redundancies themselves. The tail value of the intersection, <it>P</it><sub>&#8898;</sub>, can be taken as an indicator on the diminishing returns of the process.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Diagrammatic synopsis of the intersection probability</p>
               </caption>
               <text>
                  <p><b>Diagrammatic synopsis of the intersection probability.</b> Paired coverage distributions, plotted at differences of one unit of redundancy, begin to coalesce as a project evolves. The intersection probability is the area of the overlap (shaded).</p>
               </text>
               <graphic file="1471-2105-9-239-3"/>
            </fig>
            <p>We can once again take a calibration approach to this problem. That is, we calculate <it>P</it><sub><it>C </it></sub>and <it>P</it><sub>&#8898; </sub>for a typical BAC sequencing project (150 kb clone length and 600 bp read length) at <it>&#961; </it>&#8712; {6, 8, 10}. We then match these values to their counterparts in the distributions for medical sequencing, which then provides the corresponding "scaled" redundancy. Table <tblr tid="T2">2</tblr> shows these results. The values compliment those in Table <tblr tid="T1">1</tblr> in the sense that they suggest redundancies far above the conventional full shotgun standard of <it>&#961; </it>= 10.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Calibration of medical sequencing according to haploid distribution models</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c cspan="3" ca="center">
                        <p>Traditional BAC Sequencing Project</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>Medical Sequencing <it>&#961; </it>based on</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Redundancy</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>P</it>
                           <sub>
                              <it>C</it>
                           </sub>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>P</it>
                           <sub>&#8898;</sub>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Complete Covg.</p>
                     </c>
                     <c ca="left">
                        <p>Intersection</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>6</p>
                     </c>
                     <c ca="left">
                        <p>0.02843</p>
                     </c>
                     <c ca="left">
                        <p>0.74122</p>
                     </c>
                     <c ca="left">
                        <p>20.2</p>
                     </c>
                     <c ca="left">
                        <p>18.8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>8</p>
                     </c>
                     <c ca="left">
                        <p>0.51842</p>
                     </c>
                     <c ca="left">
                        <p>0.95054</p>
                     </c>
                     <c ca="left">
                        <p>22</p>
                     </c>
                     <c ca="left">
                        <p>20.9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>10</p>
                     </c>
                     <c ca="left">
                        <p>0.89840</p>
                     </c>
                     <c ca="left">
                        <p>0.99301</p>
                     </c>
                     <c ca="left">
                        <p>23.9</p>
                     </c>
                     <c ca="left">
                        <p>23</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Comparison to Empirical, Semi-Empirical, and Simulation Results</p>
            </st>
            <p>Several labs are now involved in diploid sequencing projects, which should furnish useful examples of coverage progressions that can be monitored empirically. Although the few stopping redundancies reported in the literature appear to conflict with one another, these can be more properly interpreted according the minimum number of times each allele is observed, <it>&#966;</it>. For example, we mentioned above that Levy et al. <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> considered <it>&#961; </it>= 20 for Sanger-based <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> germline sequencing of a healthy individual. This figure approaches the 10&#215; standard for haploid sequencing <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> at the level of <it>&#966; </it>= 1 (Table <tblr tid="T1">1</tblr>). Interestingly, the idealized version of their coverage calculation proves to be a special case of our model precisely for <it>&#966; </it>= 1 (see Appendix). Conversely, Mardis <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> quotes a redundancy up to 30&#215;, which corresponds to values of <it>&#966; </it>between 2 and 3 when calibrated to haploid 10&#215;.</p>
            <p>Richard Durbin and Aylwyn Scally have also analyzed the diploid medical sequencing coverage problem using a different approach from what is described here (Durbin and Scally, personal communication). Specifically, they employed an "extra-variation" Poisson distribution <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp> having a free-parameter to control variance. Values for this parameter can be chosen to <it>a posteriori </it>tune the theoretical fit with empirical data. In particular, such tuning allows one to implicitly consider, at least approximately, factors such as bias and sequencing errors. (In our method, calibration incorporates the empirics of BAC sequencing, essentially serving the same purpose.) Using their semi-empirical approach, Durbin and Scally concluded that redundancies closer to 30&#215; will be required, which again agrees well with results shown in Tables <tblr tid="T1">1</tblr> and <tblr tid="T2">2</tblr>.</p>
            <p>A number of labs, including our own, are now adopting "next generation" short-read sequencing technology <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> and have started to generate human medical sequencing data related to various cancers. However, there is still a dearth of published results from which actual coverage progressions can be derived. For the purposes of comparison, we refer instead to the recently completed pilot resequencing project for <it>C. elegans</it>. Hillier et al. <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> resequenced strain N2 Bristol using the Illumina Genome Analyzer in order to characterize the accuracy and utility of short-read, massively parallel data. We have projected their <it>C. elegans </it>coverage results for <it>&#966; </it>&#8712; {1, 2} onto Fig. <figr fid="F1">1</figr>. Agreement is very good up to about 60% coverage, after which the rate of empirical coverage falls below expectation. This behavior seems to typify theoretical-empirical differences. For example, Wendl and Barbazuk <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> noted precisely this trend for sequencing filtered genomes.</p>
            <p>The physical explanation of this phenomenon is straightforward. Specifically, biases are not manifested early in a project because there is not enough information to distinguish unbiased coverage configurations from biased ones. (Think of an extreme case, for example placing a single read of high GC content onto a genome of high AT content. Despite the obvious non-IID nature of this scenario, the predicted coverage will still be <it>identical </it>to the actual coverage.) Given a model based on the IID assumption, as ours is, empirical and theoretical results should start to diverge as sufficient information gathers to expose latent biases. In this case, Hillier et al. <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> note a definite AT bias using the Illumina platform, i.e. remarkably lower coverage in regions of high AT content, which we presume accounts for much of the difference shown in Fig. <figr fid="F1">1</figr>. The proclivities of other methods and platforms are evidently different <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. Consequently, Eq. 2 should also be useful as a yardstick for comparison among these approaches for specific applications.</p>
            <p>Finally, Fig. <figr fid="F1">1</figr> also shows simulation data reported by Smith and Bernstein <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> for <it>&#966; </it>= 1 on a 20 kb circularized fragment using 250 bp reads. Agreement is once again good up to about 60% coverage, after which the sequencing process seems to grow more efficient for the fragment. This observation is not surprising, given two important aspects of this study. First, the circularized configuration is not subject to the so-called "edge effect", which can dramatically affect coverage rates <abbrgrp><abbr bid="B29">29</abbr><abbr bid="B31">31</abbr></abbrgrp>. Second, distribution theories show that configurations having larger <it>L</it>/<it>G </it>ratios do indeed cover more readily than those having smaller values <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. We presume these two factors account for most of the difference, especially given that <it>L</it>/<it>G </it>= 0.0125 for the simulation is more than a million times larger than values associated with short-read medical sequencing projects. For example, 31 bp reads on a 3.3 billion bp genome yields <it>L</it>/<it>G </it>&#8776; 1 &#215; 10<sup>-8</sup>.</p>
         </sec>
         <sec>
            <st>
               <p>A Differential Sequencing Strategy</p>
            </st>
            <p>We expect that many future studies will be based on sequencing DNA derived from matched tumor/normal samples (for example, the latter being obtained from uninvolved skin or blood) from the same patient <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. Here, the whole genome of each sample in a pair is sequenced and mutations are found by comparison to the human reference. Let us call the sets of mutations for a tumor and a normal sample <it>S</it><sub><it>T </it></sub>and <it>S</it><sub><it>N</it></sub>, respectively, where we generally expect <it>S</it><sub><it>N </it></sub>&#8838; <it>S</it><sub><it>T</it></sub>. Most of the germline variation in <it>S</it><sub><it>N </it></sub>will be polymorphisms not related to pathogenesis <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, whereas <it>S</it><sub><it>T </it></sub>will contain a potentially more relevant collection of somatic mutations. In principle, germline sequence variations can be removed from further consideration by taking the difference <it>S</it><sub><it>T </it></sub>- <it>S</it><sub><it>N </it></sub><abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Such filtering will appreciably focus subsequent work, since the overwhelming majority of sequence variants should be polymorphisms found in normal tissue <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. How does one efficiently accomplish this from a process-engineering standpoint?</p>
            <p>We propose a refinement of simple subtraction <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> in the form of a straightforward differential sequencing strategy. In principle, false-negative errors are controlled by sequencing at least to diploid coverage at the level of <it>&#966; </it>= 1. However, tumor samples should actually be sequenced as heavily as economically possible in order to minimize false-positive hits for both germline and somatic mutations. These types of mistakes arise, for example, by misinterpreting a random sequencing error as a true mutation. Given current state-of-the-art capabilities, we will assume this condition translates to diploid coverage at the level of <it>&#966; </it>= 2, but emphasize that future instruments will undoubtedly permit higher <it>&#966;</it>.</p>
            <p>Conversely, a germline mutation in a normal sample only has to be detected once in order to be eliminated from <it>S</it><sub><it>T</it></sub>. We are also not as concerned about false-positives here because their appearance in <it>S</it><sub><it>N </it></sub>does not affect the subtraction <it>S</it><sub><it>T </it></sub>- <it>S</it><sub><it>N</it></sub>. It is possible that an error could lead to a spurious entry in <it>S</it><sub><it>N </it></sub>that precisely matches a true somatic mutation in <it>S</it><sub><it>T </it></sub>by pure chance. The somatic mutation would then be erroneously eliminated from further investigation. However, such events seem unlikely, given the low anticipated number of bona fide somatic mutations. These observations collectively imply that normal samples may only need to be sequenced to diploid coverage at the level of <it>&#966; </it>= 1.</p>
            <p>The 10&#215; standard for BAC sequencing is well-established <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> and provides a reasonably conservative basis to translate the above design into actual redundancies for medical sequencing (Table <tblr tid="T1">1</tblr>). We suggest then that sequencing of tumor samples should not be pursued to less than about 26.5&#215; redundancy, given the 2-read minimum coverage condition. Furthermore, paired normal samples need only to be sequenced to about 21.5&#215; redundancy for the <it>&#966; </it>= 1 coverage condition.</p>
         </sec>
         <sec>
            <st>
               <p>Expository Comment on Recommended Redundancies</p>
            </st>
            <p>The observation that the required redundancy for the <it>&#966; </it>= 1 coverage level is not simply half that of the <it>&#966; </it>= 2 level may initially seem counter-intuitive. We remarked above that curves for increasing values of <it>&#966; </it>tend to have progressively smaller slopes. Consequently, there is not, in general, an integer-valued relationship between corresponding points on any two particular curves. In other words, curves are not simply shifted along <it>&#961;</it>. Returning to Fig. <figr fid="F1">1</figr>, we show an example that replots the <it>&#966; </it>= 2 curve, except where the abscissa-value of each of its points has been divided by two. It is clear that the result does not coincide with the <it>&#966; </it>= 1 curve, as intuition may have suggested. The curves only intersect at a single point, here at an expected coverage slightly more than 0.5, which is well below the > 0.99 calibration points we chose above. In other words, redundancy for <it>&#966; </it>= 1 would only have been half that for <it>&#966; </it>= 2 had we chosen to cover 50% instead of > 0.99 of SNPs. This trend holds generally. That is, we do not expect an integer relationship between the required redundancies for two unequal, but otherwise arbitrary values of <it>&#966;</it>.</p>
         </sec>
         <sec>
            <st>
               <p>Coverage Projections for Aneuploid Configurations</p>
            </st>
            <p>Aneuploidy can be manifested in a number of ways: as an autosomal <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> or sex chromosome <abbrgrp><abbr bid="B33">33</abbr></abbrgrp> aberration, and in conjunction with cancer <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. We anticipate the eventual application of DNA sequencing to aneuploid chromosome configurations and offer some early projections based upon Eq. 6. Fig. <figr fid="F4">4</figr> shows expected coverage for trisomy and tetrasomy for <it>&#966; </it>= 2 and <it>&#966; </it>= 3. Required depths are clearly much higher than for diploid sequencing. For example, we find redundancies of <it>&#961; </it>= 42 and <it>&#961; </it>= 57 for trisomy and tetrasomy, respectively, when scaling to 10&#215; BAC sequencing at the level of <it>&#966; </it>= 2. Recall that <it>&#966; </it>= 2 is presumed to be feasible for diploid whole-genome sequencing using current hardware. These redundancies are clearly out of reach at the moment for a whole-genome project, but may be feasible for chromosome-specific projects. In other words, the appreciably higher cost of sequencing aneuploid chromosomes may justify the effort of separating them into their own self-contained projects.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Expected coverage for aneuploid chromosome configurations for minimum number of covering reads <it>&#966; </it>&#8712; {2, 3}</p>
               </caption>
               <text>
                  <p>
                     <b>Expected coverage for aneuploid chromosome configurations for minimum number of covering reads <it>&#966; </it>&#8712; {2, 3}.</b>
                  </p>
               </text>
               <graphic file="1471-2105-9-239-4"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Modeling Limitations</p>
            </st>
            <p>As with the classical theories of sequencing <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>, the main assumption here is that reads are independently and identically distributed (IID). In other words, this analysis does not formally consider biological or instrument-specific biases, software biases and sequencing errors, for example in base-calling, assembly problems, or any other heuristic inputs. The idiosyncrasies of each of these factors are difficult to characterize analytically, although the calibration step does allow some implicit accounting, as noted above. Appreciable differences in levels and types of bias have been noted, for instance in Sanger-style sequencing versus pyrosequencing <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>, so any results should be interpreted with these qualifications in mind.</p>
            <p>In general, the assumption of allele independence should be valid for most medical sequencing projects since <it>&#966; </it>will be small and <it>L</it>/<it>G </it>&#8594; 0 and <it>N </it>&#8811; 1 (see Methods). For example, maximum error for <it>&#966; </it>&#8712; {1, 2} is on the order of 10<sup>-5 </sup>percent for diploid sequencing using 650 bp read lengths. The theory further assumes that sequence reads have no preference for either chromosome of a homologous pair and neglects any tendency for reads to align to multiple positions. The latter has been found to occur with some frequency if reads are short enough <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. Read-pairing certainly curtails this phenomenon, but the pairing process itself has negligible effect on coverage unless the target is very small <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. Such is not the case in whole-genome medical sequencing, so the net effect of pairing is simply that the amount of uniquely-alignable sequence one gets to count toward <it>&#961; </it>increases commensurately. In plain terms, fewer data will be discarded.</p>
            <p>Finally, our analysis does not account for what might be called the "uneven coverage" problem of alleles. Mutation detection programs may decline calling out a SNP if one allele is covered much more heavily than the other <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. Because this phenomenon is both software-specific and sequence-specific, it is beyond our scope. Departure from any of these idealizations will tend to reduce coverage, implying that our analysis is best viewed in the context of upper bounds of performance. In other words, required redundancies for specific projects may still exceed what we have advocated here, as the <it>C. elegans </it>data in Fig. <figr fid="F1">1</figr> illustrate.</p>
            <p>A subtle mathematical point is also worth mentioning. Eqs. 1 through 5 represent the probability of covering a specific allele pair, or alternatively, the expected fraction of pairs covered. These expressions do not provide the underlying distribution of the number of covered pairs, which is a more formidable mathematical problem. In other words, this is a model only of coverage expectation, exactly analogous to what classical theories <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp> are for traditional <it>de novo </it>haploid sequencing. Consequently, the results themselves are not strong functions of <it>L</it>/<it>G</it>. In fact, for most applications the results will be completely independent of this ratio and will instead follow a set of "universal" curves, the first 5 of which are shown in Figs. <figr fid="F1">1</figr> and <figr fid="F2">2</figr>. This point is underscored by Eq. 2, which is a function strictly of <it>&#961;</it>. (The observation holds more generally for aneuploidy as described by Eq. 6, as well.) This phenomenon contrasts with distribution-based models, such as <it>P</it><sub><it>C </it></sub>and <it>P</it><sub>&#8898; </sub>discussed above, which are indeed sensitive to <it>L</it>/<it>G</it>. The basis of this effect is discussed in ref. <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>.</p>
            <p>A corollary to this observation is that adjusting the fundamental parameters within their biologically-relevant limits will have no effect on the results we have discussed. For example, the haploid genome size <it>G </it>could be adjusted to reflect only that part of the sequence to which read data can be uniquely aligned <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. Yet, the underlying assumptions leading to Eqs. 2 and 6 will still be satisfied in this circumstance, mainly <it>L</it>/<it>G </it>&#8594; 0 and <it>N </it>&#8811; 1 (Methods). The same holds for varying <it>L </it>in order to represent different kinds of sequencing platforms, e.g. pyrosequencing or Sanger instruments. In summary, the contributions of the three independent variables <it>L</it>, <it>G</it>, and <it>N </it>collapse into the single dimensionless variable <it>&#961;</it>, which governs the process exclusively. Formal theory <abbrgrp><abbr bid="B37">37</abbr></abbrgrp> predicts such systematic reductions of variables whenever a unified dimensionless parameter lurks in a problem.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>The differential sequencing strategy should be useful for efficiently identifying lists of somatic mutations for validation and further study. Our analytical model of coverage, coupled with a calibration approach for selecting parameters, allows pragmatic estimates to be made for such projects. However, because the theory does not strictly consider various biasing factors, actual projects would benefit from periodically aligning (assembling) shotgun data to empirically track overall coverage, as well as local coverage in coding regions, UTRs, promoters, and conserved regions. SNP arrays could also be done for each sample, with attempts made to find and correlate data to sequence calls for further coverage tracking. Plotting these various data on a single figure, as we did for the <it>C. elegans </it>data in Fig. <figr fid="F1">1</figr>, should be informative. Finally, the basic model could be further extended in the future as more data accrue from different methods, projects, software processing pipelines, etc. For example, "extra-variation" methods <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp> could be used for <it>a posteriori </it>data fitting, the results of which should help to better quantify non-IID factors.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <p>Proofs of Eqs. 1, 2, and 6 are reported here. Eqs. 3 through 5 follow trivially from Eq. 2. We also describe the analysis of <it>C. elegans </it>resequencing data from Hillier et al. <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>.</p>
         <sec>
            <st>
               <p>Preliminaries</p>
            </st>
            <p>Let <it>B</it><sub><it>i</it>, <it>j </it></sub>be the event where an allele at position <it>x </it>on chromosome <it>i </it>is "covered", i.e. spanned by at least <it>&#966; </it>out of any collection of <it>j </it>reads, where <it>j </it>&#8805; <it>&#966;</it>. Given <it>N </it>total reads, our definition of diploid medical sequencing coverage for position <it>x </it>is then <it>B</it><sub>1, <it>N </it></sub>&#8898; <it>B</it><sub>2, <it>N </it></sub>and its probability is <it>P</it><sub>2, &#966;</sub>(<it>B</it><sub>1, <it>N </it></sub>&#8898; <it>B</it><sub>2, <it>N</it></sub>). If <it>&#946;</it><sub><it>i</it>, <it>j</it>, <it>k </it></sub>is the event whereby the allele on chromosome <it>i </it>is spanned by exactly <it>k </it>of <it>j </it>reads, then <it>B</it><sub><it>i</it>, <it>j </it></sub>&#8801; <it>&#946;</it><sub><it>i</it>, <it>j</it>, <it>&#966; </it></sub>&#8899; <it>&#946;</it><sub><it>i</it>, <it>j</it>, <it>&#966;</it>+1 </sub>&#8899; &#8943; &#8899; <it>&#946;</it><sub><it>i</it>, <it>j</it>, <it>j</it></sub>. Considering two homologous chromosomes, <it>i </it>&#8712; {1, 2}, the probability that a single given read spans <it>x </it>on a specific chromosome is <it>&#948;</it><sub>2 </sub>= <it>L</it>/(2<it>G</it>), where <it>L </it>and <it>G </it>are read length and haploid genome length, respectively. Since the process is binomial (covering or not covering), we immediately have <inline-formula><m:math name="1471-2105-9-239-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>P</m:mi><m:mo stretchy="false">(</m:mo><m:msub><m:mi>&#946;</m:mi><m:mrow><m:mi>i</m:mi><m:mo>,</m:mo><m:mi>j</m:mi><m:mo>,</m:mo><m:mi>k</m:mi></m:mrow></m:msub><m:mo stretchy="false">)</m:mo><m:mo>=</m:mo><m:msub><m:mi>C</m:mi><m:mrow><m:mi>j</m:mi><m:mo>,</m:mo><m:mi>k</m:mi></m:mrow></m:msub><m:msubsup><m:mi>&#948;</m:mi><m:mn>2</m:mn><m:mi>k</m:mi></m:msubsup><m:msup><m:mrow><m:mo stretchy="false">(</m:mo><m:mn>1</m:mn><m:mo>&#8722;</m:mo><m:msub><m:mi>&#948;</m:mi><m:mn>2</m:mn></m:msub><m:mo stretchy="false">)</m:mo></m:mrow><m:mrow><m:mi>j</m:mi><m:mo>&#8722;</m:mo><m:mi>k</m:mi></m:mrow></m:msup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiuaaLaeiikaGIaeqOSdi2aaSbaaSqaaiabdMgaPjabcYcaSiabdQgaQjabcYcaSiabdUgaRbqabaGccqGGPaqkcqGH9aqpcqWGdbWqdaWgaaWcbaGaemOAaOMaeiilaWIaem4AaSgabeaakiabes7aKnaaDaaaleaacqaIYaGmaeaacqWGRbWAaaGccqGGOaakcqaIXaqmcqGHsislcqaH0oazdaWgaaWcbaGaeGOmaidabeaakiabcMcaPmaaCaaaleqabaGaemOAaOMaeyOeI0Iaem4AaSgaaaaa@4AA4@</m:annotation></m:semantics></m:math></inline-formula>, where <it>C</it><sub><it>j</it>, <it>k </it></sub>are the binomial coefficients.</p>
         </sec>
         <sec>
            <st>
               <p>Proof of Eq. 1</p>
            </st>
            <p>The coverings of two homologous alleles are not independent of one another. For instance, if one allele is already covered by <it>j </it>reads, there are only <it>N </it>- <it>j </it>remaining reads that have a chance to cover the other allele. Consequently,</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-9-239-i8" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>P</m:mi>
                              <m:mrow>
                                 <m:mn>2</m:mn>
                                 <m:mo>,</m:mo>
                                 <m:mi>&#966;</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msub>
                              <m:mi>B</m:mi>
                              <m:mrow>
                                 <m:mn>1</m:mn>
                                 <m:mo>,</m:mo>
                                 <m:mi>N</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mo>&#8745;</m:mo>
                           <m:msub>
                              <m:mi>B</m:mi>
                              <m:mrow>
                                 <m:mn>2</m:mn>
                                 <m:mo>,</m:mo>
                                 <m:mi>N</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mi>P</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msub>
                              <m:mi>B</m:mi>
                              <m:mrow>
                                 <m:mn>1</m:mn>
                                 <m:mo>,</m:mo>
                                 <m:mi>N</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>&#8901;</m:mo>
                           <m:mi>P</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msub>
                              <m:mi>B</m:mi>
                              <m:mrow>
                                 <m:mn>2</m:mn>
                                 <m:mo>,</m:mo>
                                 <m:mi>N</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mo>|</m:mo>
                           <m:msub>
                              <m:mi>B</m:mi>
                              <m:mrow>
                                 <m:mn>1</m:mn>
                                 <m:mo>,</m:mo>
                                 <m:mi>N</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munderover>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>j</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mi>&#966;</m:mi>
                                 </m:mrow>
                                 <m:mi>N</m:mi>
                              </m:munderover>
                              <m:mrow>
                                 <m:mi>P</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:msub>
                                    <m:mi>&#946;</m:mi>
                                    <m:mrow>
                                       <m:mn>1</m:mn>
                                       <m:mo>,</m:mo>
                                       <m:mi>N</m:mi>
                                       <m:mo>,</m:mo>
                                       <m:mi>j</m:mi>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo>&#8901;</m:mo>
                                 <m:mi>P</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:msub>
                                    <m:mi>B</m:mi>
                                    <m:mrow>
                                       <m:mn>2</m:mn>
                                       <m:mo>,</m:mo>
                                       <m:mi>N</m:mi>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mi>j</m:mi>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo>|</m:mo>
                                 <m:msub>
                                    <m:mi>&#946;</m:mi>
                                    <m:mrow>
                                       <m:mn>1</m:mn>
                                       <m:mo>,</m:mo>
                                       <m:mi>N</m:mi>
                                       <m:mo>,</m:mo>
                                       <m:mi>j</m:mi>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo>,</m:mo>
                              </m:mrow>
                           </m:mstyle>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiuaa1aaSbaaSqaaiabikdaYiabcYcaSiabeA8aMbqabaGccqGGOaakcqWGcbGqdaWgaaWcbaGaeGymaeJaeiilaWIaemOta4eabeaakiablMIijjabdkeacnaaBaaaleaacqaIYaGmcqGGSaalcqWGobGtaeqaaOGaeiykaKIaeyypa0JaemiuaaLaeiikaGIaemOqai0aaSbaaSqaaiabigdaXiabcYcaSiabd6eaobqabaGccqGGPaqkcqGHflY1cqWGqbaucqGGOaakcqWGcbGqdaWgaaWcbaGaeGOmaiJaeiilaWIaemOta4eabeaakiabcYha8jabdkeacnaaBaaaleaacqaIXaqmcqGGSaalcqWGobGtaeqaaOGaeiykaKIaeyypa0ZaaabCaeaacqWGqbaucqGGOaakcqaHYoGydaWgaaWcbaGaeGymaeJaeiilaWIaemOta4KaeiilaWIaemOAaOgabeaakiabcMcaPiabgwSixlabdcfaqjabcIcaOiabdkeacnaaBaaaleaacqaIYaGmcqGGSaalcqWGobGtcqGHsislcqWGQbGAaeqaaOGaeiiFaWNaeqOSdi2aaSbaaSqaaiabigdaXiabcYcaSiabd6eaojabcYcaSiabdQgaQbqabaGccqGGPaqkcqGGSaalaSqaaiabdQgaQjabg2da9iabeA8aMbqaaiabd6eaobqdcqGHris5aaaa@7B05@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-9-239-i9" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>P</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msub>
                              <m:mi>B</m:mi>
                              <m:mrow>
                                 <m:mn>2</m:mn>
                                 <m:mo>,</m:mo>
                                 <m:mi>N</m:mi>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mi>j</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mo>|</m:mo>
                           <m:msub>
                              <m:mi>&#946;</m:mi>
                              <m:mrow>
                                 <m:mn>1</m:mn>
                                 <m:mo>,</m:mo>
                                 <m:mi>N</m:mi>
                                 <m:mo>,</m:mo>
                                 <m:mi>j</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munderover>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>k</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mi>&#966;</m:mi>
                                 </m:mrow>
                                 <m:mi>N</m:mi>
                              </m:munderover>
                              <m:mrow>
                                 <m:mi>P</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:msub>
                                    <m:mi>&#946;</m:mi>
                                    <m:mrow>
                                       <m:mn>2</m:mn>
                                       <m:mo>,</m:mo>
                                       <m:mi>N</m:mi>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mi>j</m:mi>
                                       <m:mo>,</m:mo>
                                       <m:mi>k</m:mi>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo>=</m:mo>
                                 <m:mn>1</m:mn>
                                 <m:mo>&#8722;</m:mo>
                              </m:mrow>
                           </m:mstyle>
                           <m:mstyle displaystyle="true">
                              <m:munderover>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>k</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mn>0</m:mn>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mi>&#966;</m:mi>
                                    <m:mo>&#8722;</m:mo>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                              </m:munderover>
                              <m:mrow>
                                 <m:mi>P</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:msub>
                                    <m:mi>&#946;</m:mi>
                                    <m:mrow>
                                       <m:mn>2</m:mn>
                                       <m:mo>,</m:mo>
                                       <m:mi>N</m:mi>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mi>j</m:mi>
                                       <m:mo>,</m:mo>
                                       <m:mi>k</m:mi>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:mstyle>
                           <m:mo>.</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiuaaLaeiikaGIaemOqai0aaSbaaSqaaiabikdaYiabcYcaSiabd6eaojabgkHiTiabdQgaQbqabaGccqGG8baFcqaHYoGydaWgaaWcbaGaeGymaeJaeiilaWIaemOta4KaeiilaWIaemOAaOgabeaakiabcMcaPiabg2da9maaqahabaGaemiuaaLaeiikaGIaeqOSdi2aaSbaaSqaaiabikdaYiabcYcaSiabd6eaojabgkHiTiabdQgaQjabcYcaSiabdUgaRbqabaGccqGGPaqkcqGH9aqpcqaIXaqmcqGHsislaSqaaiabdUgaRjabg2da9iabeA8aMbqaaiabd6eaobqdcqGHris5aOWaaabCaeaacqWGqbaucqGGOaakcqaHYoGydaWgaaWcbaGaeGOmaiJaeiilaWIaemOta4KaeyOeI0IaemOAaOMaeiilaWIaem4AaSgabeaakiabcMcaPaWcbaGaem4AaSMaeyypa0JaeGimaadabaGaeqOXdyMaeyOeI0IaeGymaedaniabggHiLdGccqGGUaGlaaa@6C00@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>Eq. 1 follows from the observation that <it>P</it>(<it>&#946;</it><sub>2, <it>N</it>-<it>j</it>, <it>k</it></sub>) = 0 for <it>k </it>> <it>N </it>- <it>j</it>.</p>
         </sec>
         <sec>
            <st>
               <p>Proof of Eq. 2</p>
            </st>
            <p>If we neglect the dependence of alleles, then <it>P</it><sub>2, <it>&#966;</it></sub>(<it>B</it><sub>1, <it>N </it></sub>&#8898; <it>B</it><sub>2, <it>N</it></sub>) = <it>P </it>(<it>B</it><sub>1, <it>N</it></sub>)&#183;<it>P </it>(<it>B</it><sub>2, <it>N</it></sub>). Without loss of generality, this probability is identical to <it>P</it><sup>2</sup>(<it>B</it><sub>1, <it>N</it></sub>), from which</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-9-239-i10" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>P</m:mi>
                              <m:mrow>
                                 <m:mn>2</m:mn>
                                 <m:mo>,</m:mo>
                                 <m:mi>&#966;</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mo>=</m:mo>
                           <m:msup>
                              <m:mrow>
                                 <m:mrow>
                                    <m:mo>(</m:mo>
                                    <m:mrow>
                                       <m:mn>1</m:mn>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mstyle displaystyle="true">
                                          <m:munderover>
                                             <m:mo>&#8721;</m:mo>
                                             <m:mrow>
                                                <m:mi>k</m:mi>
                                                <m:mo>=</m:mo>
                                                <m:mn>0</m:mn>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mi>&#966;</m:mi>
                                                <m:mo>&#8722;</m:mo>
                                                <m:mn>1</m:mn>
                                             </m:mrow>
                                          </m:munderover>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>C</m:mi>
                                                <m:mrow>
                                                   <m:mi>N</m:mi>
                                                   <m:mo>,</m:mo>
                                                   <m:mi>k</m:mi>
                                                </m:mrow>
                                             </m:msub>
                                             <m:msubsup>
                                                <m:mi>&#948;</m:mi>
                                                <m:mn>2</m:mn>
                                                <m:mi>k</m:mi>
                                             </m:msubsup>
                                             <m:msup>
                                                <m:mrow>
                                                   <m:mo stretchy="false">(</m:mo>
                                                   <m:mn>1</m:mn>
                                                   <m:mo>&#8722;</m:mo>
                                                   <m:msub>
                                                      <m:mi>&#948;</m:mi>
                                                      <m:mn>2</m:mn>
                                                   </m:msub>
                                                   <m:mo stretchy="false">)</m:mo>
                                                </m:mrow>
                                                <m:mrow>
                                                   <m:mi>N</m:mi>
                                                   <m:mo>&#8722;</m:mo>
                                                   <m:mi>k</m:mi>
                                                </m:mrow>
                                             </m:msup>
                                          </m:mrow>
                                       </m:mstyle>
                                    </m:mrow>
                                    <m:mo>)</m:mo>
                                 </m:mrow>
                              </m:mrow>
                              <m:mn>2</m:mn>
                           </m:msup>
                           <m:mo>.</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiuaa1aaSbaaSqaaiabikdaYiabcYcaSiabeA8aMbqabaGccqGH9aqpdaqadaqaaiabigdaXiabgkHiTmaaqahabaGaem4qam0aaSbaaSqaaiabd6eaojabcYcaSiabdUgaRbqabaGccqaH0oazdaqhaaWcbaGaeGOmaidabaGaem4AaSgaaOGaeiikaGIaeGymaeJaeyOeI0IaeqiTdq2aaSbaaSqaaiabikdaYaqabaGccqGGPaqkdaahaaWcbeqaaiabd6eaojabgkHiTiabdUgaRbaaaeaacqWGRbWAcqGH9aqpcqaIWaamaeaacqaHgpGzcqGHsislcqaIXaqma0GaeyyeIuoaaOGaayjkaiaawMcaamaaCaaaleqabaGaeGOmaidaaOGaeiOla4caaa@5380@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>Parameters for medical sequencing projects are such that <it>L</it>/<it>G </it>&#8594; 0 and <it>N </it>&#8811; 1. These relations hold for both "short read" platforms and instruments that provide "full-length" Sanger read data. Moreover, <it>&#966; </it>&#8810; <it>N</it>, which implies <it>k </it>&#8810; <it>N</it>. Consequently, the binomial coefficients <it>C</it><sub><it>N</it>, <it>k </it></sub>are well-approximated by <it>N</it><sup><it>k</it></sup>/<it>k</it>!. These conditions also imply (1 - &#948;<sub>2</sub>)<sup><it>N </it></sup>~ exp(-<it>N&#948;</it><sub>2</sub>), i.e. that asymptotic approximation can be used for the power term. Eq. 2 follows directly.</p>
         </sec>
         <sec>
            <st>
               <p>Proof of Eq. 6</p>
            </st>
            <p>The case of aneuploidy under the assumption of allele independence is a straightforward extension of the proof for Eq. 2. For <it>h </it>homologous chromosomes, <it>P</it><sub><it>h</it>, <it>&#966; </it></sub>(<it>B</it><sub>1, <it>N </it></sub>&#8898; <it>B</it><sub>2, <it>N </it></sub>&#8898; &#8943; &#8898; <it>B</it><sub><it>h</it>, <it>N</it></sub>) = <it>P</it>(<it>B</it><sub>1, <it>N</it></sub>)<it>P </it>(<it>B</it><sub>2, <it>N</it></sub>) &#8943; <it>P </it>(<it>B</it><sub><it>h</it>, <it>N</it></sub>) = <it>P</it><sup><it>h</it></sup>(<it>B</it><sub>1, <it>N</it></sub>). Given that all <it>h </it>chromosomes are equally likely to be sampled, <it>&#948;</it><sub><it>h </it></sub>= <it>L</it>/(<it>hG</it>), which is the appropriate Bernoulli probability for <it>P</it>(<it>B</it><sub>1, <it>N</it></sub>). Eq. 6 follows from the same approximation arguments made for Eq. 2.</p>
         </sec>
         <sec>
            <st>
               <p>Eqs. 1, 2, and 6 in the Context of Expectation</p>
            </st>
            <p>We can take the coverage status of a specific allele pair as a Bernoulli trial, whereby elementary probability theory shows that the expected number of pairs covered is their total number multiplied by <it>P</it><sub>2, <it>&#966;</it></sub>(<it>B</it><sub>1, <it>N </it></sub>&#8898; <it>B</it><sub>2, <it>N</it></sub>). Consequently, <it>P</it><sub>2, <it>&#966;</it></sub>(<it>B</it><sub>1, <it>N </it></sub>&#8898; <it>B</it><sub>2, <it>N</it></sub>) in Eq. 1 and its approximation in Eq. 2 also represent the expected <it>fraction </it>of covered pairs. The same argument holds for Eq. 6.</p>
         </sec>
         <sec>
            <st>
               <p>Analysis of <it>C. elegans </it>resequencing data</p>
            </st>
            <p>Hillier et al. <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> used the Illumina Genome Analyzer to resequence the <it>C. elegans </it>N2 Bristol genome. Release ws188 of the genomic sequence <abbrgrp><abbr bid="B38">38</abbr></abbrgrp> was downloaded from <url>http://www.wormbase.org</url> and randomly chosen subsets of the resequence data were aligned against the reference at regular intervals for each chromosome using the maq aligner (<url>http://maq.sourceforge.net</url>). Data that could not be uniquely placed on the reference were discarded. Coverage was calculated for each alignment as the number of corresponding base positions spanned by at least one read on homologous chromosomes (<it>&#966; </it>= 1) and by at least two reads on homologous chromosomes (<it>&#966; </it>= 2).</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Appendix: Idealized Theory of Levy et al</p>
         </st>
         <p>Levy et al. <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> sketch a rudimentary diploid theory, though they do not furnish any corresponding mathematical description. Here, we reconstruct an idealized version of their model, i.e. the form which assumes all entities are IID and which omits any heuristic inputs. A careful reading of "Modeling False-Negative Rate of Heterozygous Variants" in ref. <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> reveals the following salient features. Chromosomes are equally likely to be sampled and loci are taken as independent of one another. (Our theory relies on these same two assumptions.) Levy et al. also assume the number of reads <it>&#957; </it>spanning a position of interest <it>x </it>is Poisson-distributed with a rate <it>&#961; </it>and that the probability of observing both alleles is a binomial function of <it>&#957;</it>. Incidentally, Richard Durbin and Aylwyn Scally discuss a similar model in their analysis (Durbin and Scally, personal communication), as do Wheeler et al. <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>.</p>
         <p>Let the random variables <b>B </b>and <b>N </b>be the events where both alleles at <it>x </it>are observed (covered) and where <it>&#957; </it>reads span <it>x</it>, respectively. We immediately have</p>
         <p>
            <display-formula id="M7">
               <m:math name="1471-2105-9-239-i11" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mi>P</m:mi>
                        <m:mo stretchy="false">(</m:mo>
                        <m:mi>N</m:mi>
                        <m:mo>=</m:mo>
                        <m:mi>&#957;</m:mi>
                        <m:mo stretchy="false">)</m:mo>
                        <m:mo>=</m:mo>
                        <m:mfrac>
                           <m:mrow>
                              <m:msup>
                                 <m:mi>e</m:mi>
                                 <m:mrow>
                                    <m:mo>&#8722;</m:mo>
                                    <m:mi>&#961;</m:mi>
                                 </m:mrow>
                              </m:msup>
                              <m:msup>
                                 <m:mi>&#961;</m:mi>
                                 <m:mi>&#957;</m:mi>
                              </m:msup>
                           </m:mrow>
                           <m:mrow>
                              <m:mi>&#957;</m:mi>
                              <m:mo>!</m:mo>
                           </m:mrow>
                        </m:mfrac>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiuaaLaeiikaGccbeGae8Nta4Kaeyypa0JaeqyVd4MaeiykaKIaeyypa0tcfa4aaSaaaeaacqWGLbqzdaahaaqabeaacqGHsislcqaHbpGCaaGaeqyWdi3aaWbaaeqabaGaeqyVd4gaaaqaaiabe27aUjabcgcaHaaaaaa@3EC9@</m:annotation>
                  </m:semantics>
               </m:math>
            </display-formula>
         </p>
         <p>from the Poisson assumption. Given <it>&#957; </it>reads spanning <it>x</it>, the probability of observing both alleles is simply the complement of the probability of any configuration in which one of the alleles is <it>not </it>represented among the <it>&#957; </it>reads. If we label the alleles <b>I </b>and <b>II</b>, then without loss of generality, the binomial model for the number of observations, <it>j</it>, of allele <b>I </b>is</p>
         <p>
            <display-formula id="M8">
               <m:math name="1471-2105-9-239-i12" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:msub>
                           <m:mi>P</m:mi>
                           <m:mi>I</m:mi>
                        </m:msub>
                        <m:mo stretchy="false">(</m:mo>
                        <m:mi>j</m:mi>
                        <m:mo stretchy="false">)</m:mo>
                        <m:mo>=</m:mo>
                        <m:msub>
                           <m:mi>C</m:mi>
                           <m:mrow>
                              <m:mi>&#957;</m:mi>
                              <m:mo>,</m:mo>
                              <m:mi>j</m:mi>
                           </m:mrow>
                        </m:msub>
                        <m:msup>
                           <m:mrow>
                              <m:mrow>
                                 <m:mo>(</m:mo>
                                 <m:mrow>
                                    <m:mfrac>
                                       <m:mn>1</m:mn>
                                       <m:mn>2</m:mn>
                                    </m:mfrac>
                                 </m:mrow>
                                 <m:mo>)</m:mo>
                              </m:mrow>
                           </m:mrow>
                           <m:mi>j</m:mi>
                        </m:msup>
                        <m:msup>
                           <m:mrow>
                              <m:mrow>
                                 <m:mo>(</m:mo>
                                 <m:mrow>
                                    <m:mn>1</m:mn>
                                    <m:mo>&#8722;</m:mo>
                                    <m:mfrac>
                                       <m:mn>1</m:mn>
                                       <m:mn>2</m:mn>
                                    </m:mfrac>
                                 </m:mrow>
                                 <m:mo>)</m:mo>
                              </m:mrow>
                           </m:mrow>
                           <m:mrow>
                              <m:mi>&#957;</m:mi>
                              <m:mo>&#8722;</m:mo>
                              <m:mi>j</m:mi>
                           </m:mrow>
                        </m:msup>
                        <m:mo>=</m:mo>
                        <m:msub>
                           <m:mi>C</m:mi>
                           <m:mrow>
                              <m:mi>&#957;</m:mi>
                              <m:mo>,</m:mo>
                              <m:mi>j</m:mi>
                           </m:mrow>
                        </m:msub>
                        <m:msup>
                           <m:mrow>
                              <m:mrow>
                                 <m:mo>(</m:mo>
                                 <m:mrow>
                                    <m:mfrac>
                                       <m:mn>1</m:mn>
                                       <m:mn>2</m:mn>
                                    </m:mfrac>
                                 </m:mrow>
                                 <m:mo>)</m:mo>
                              </m:mrow>
                           </m:mrow>
                           <m:mi>&#957;</m:mi>
                        </m:msup>
                        <m:mo>.</m:mo>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiuaa1aaSbaaSqaaGqabiab=LeajbqabaGccqGGOaakcqWGQbGAcqGGPaqkcqGH9aqpcqWGdbWqdaWgaaWcbaGaeqyVd4MaeiilaWIaemOAaOgabeaakmaabmaajuaGbaWaaSaaaeaacqaIXaqmaeaacqaIYaGmaaaakiaawIcacaGLPaaadaahaaWcbeqaaiabdQgaQbaakmaabmaabaGaeGymaeJaeyOeI0scfa4aaSaaaeaacqaIXaqmaeaacqaIYaGmaaaakiaawIcacaGLPaaadaahaaWcbeqaaiabe27aUjabgkHiTiabdQgaQbaakiabg2da9iabdoeadnaaBaaaleaacqaH9oGBcqGGSaalcqWGQbGAaeqaaOWaaeWaaKqbagaadaWcaaqaaiabigdaXaqaaiabikdaYaaaaOGaayjkaiaawMcaamaaCaaaleqabaGaeqyVd4gaaOGaeiOla4caaa@54E8@</m:annotation>
                  </m:semantics>
               </m:math>
            </display-formula>
         </p>
         <p>There are two configurations in which only one of the alleles is observed: <it>j </it>= 0 (all reads hit allele <b>II</b>) and <it>j </it>= <it>&#957; </it>(all reads hit allele <b>I</b>). Consequently, the probability of observing both alleles in <it>&#957; </it>reads is <it>P</it>(<b>B</b>|<b>N </b>= <it>&#957;</it>) = 1 - <it>P</it><sub><b>I</b></sub>(0) - <it>P</it><sub><b>I</b></sub>(<it>&#957;</it>). Using Eq. 8, a little algebra shows</p>
         <p>
            <display-formula id="M9">
               <m:math name="1471-2105-9-239-i13" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mi>P</m:mi>
                        <m:mo stretchy="false">(</m:mo>
                        <m:mi>B</m:mi>
                        <m:mo>|</m:mo>
                        <m:mi>N</m:mi>
                        <m:mo>=</m:mo>
                        <m:mi>&#957;</m:mi>
                        <m:mo stretchy="false">)</m:mo>
                        <m:mo>=</m:mo>
                        <m:mn>1</m:mn>
                        <m:mo>&#8722;</m:mo>
                        <m:msup>
                           <m:mrow>
                              <m:mrow>
                                 <m:mo>(</m:mo>
                                 <m:mrow>
                                    <m:mfrac>
                                       <m:mn>1</m:mn>
                                       <m:mn>2</m:mn>
                                    </m:mfrac>
                                 </m:mrow>
                                 <m:mo>)</m:mo>
                              </m:mrow>
                           </m:mrow>
                           <m:mrow>
                              <m:mi>&#957;</m:mi>
                              <m:mo>&#8722;</m:mo>
                              <m:mn>1</m:mn>
                           </m:mrow>
                        </m:msup>
                        <m:mo>,</m:mo>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiuaaLaeiikaGYexLMBbXgBcf2CPn2qVrwzqf2zLnharyGvLjhzH5wyaGabbiaa=jeacqGG8baFcaWFobGaeyypa0JaeqyVd4MaeiykaKIaeyypa0JaeGymaeJaeyOeI0YaaeWaaKqbagaadaWcaaqaaiabigdaXaqaaiabikdaYaaaaOGaayjkaiaawMcaamaaCaaaleqabaGaeqyVd4MaeyOeI0IaeGymaedaaOGaeiilaWcaaa@4AFC@</m:annotation>
                  </m:semantics>
               </m:math>
            </display-formula>
         </p>
         <p>which is defined for <it>&#957; </it>&#8805; 1. Note that the probability exceeds zero only for <it>&#957; </it>&#8805; 2, as we would expect. That is, at least 2 reads must span <it>x </it>before it is possible to observe both alleles. The Theorem of Total Probability now furnishes the desired result, <it>P</it>(<b>B</b>), from Eqs. 7 and 9, as follows.</p>
         <p>
            <display-formula id="M10">
               <m:math name="1471-2105-9-239-i14" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mtable>
                        <m:mtr>
                           <m:mtd>
                              <m:mi>P</m:mi>
                              <m:mo stretchy="false">(</m:mo>
                              <m:mi>B</m:mi>
                              <m:mo stretchy="false">)</m:mo>
                              <m:mo>=</m:mo>
                              <m:mstyle displaystyle="true">
                                 <m:munder>
                                    <m:mo>&#8721;</m:mo>
                                    <m:mi>&#957;</m:mi>
                                 </m:munder>
                                 <m:mrow>
                                    <m:mi>P</m:mi>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>B</m:mi>
                                    <m:mo>|</m:mo>
                                    <m:mi>N</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mi>&#957;</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                    <m:mi>P</m:mi>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>N</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mi>&#957;</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                 </m:mrow>
                              </m:mstyle>
                           </m:mtd>
                        </m:mtr>
                        <m:mtr>
                           <m:mtd>
                              <m:mo>=</m:mo>
                              <m:mstyle displaystyle="true">
                                 <m:munderover>
                                    <m:mo>&#8721;</m:mo>
                                    <m:mrow>
                                       <m:mi>&#957;</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mn>2</m:mn>
                                    </m:mrow>
                                    <m:mi>&#8734;</m:mi>
                                 </m:munderover>
                                 <m:mrow>
                                    <m:mrow>
                                       <m:mo>[</m:mo>
                                       <m:mrow>
                                          <m:mn>1</m:mn>
                                          <m:mo>&#8722;</m:mo>
                                          <m:msup>
                                             <m:mrow>
                                                <m:mrow>
                                                   <m:mo>(</m:mo>
                                                   <m:mrow>
                                                      <m:mfrac>
                                                         <m:mn>1</m:mn>
                                                         <m:mn>2</m:mn>
                                                      </m:mfrac>
                                                   </m:mrow>
                                                   <m:mo>)</m:mo>
                                                </m:mrow>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mi>&#957;</m:mi>
                                                <m:mo>&#8722;</m:mo>
                                                <m:mn>1</m:mn>
                                             </m:mrow>
                                          </m:msup>
                                       </m:mrow>
                                       <m:mo>]</m:mo>
                                    </m:mrow>
                                 </m:mrow>
                              </m:mstyle>
                              <m:mfrac>
                                 <m:mrow>
                                    <m:msup>
                                       <m:mi>e</m:mi>
                                       <m:mrow>
                                          <m:mo>&#8722;</m:mo>
                                          <m:mi>&#961;</m:mi>
                                       </m:mrow>
                                    </m:msup>
                                    <m:msup>
                                       <m:mi>&#961;</m:mi>
                                       <m:mi>&#957;</m:mi>
                                    </m:msup>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mi>&#957;</m:mi>
                                    <m:mo>!</m:mo>
                                 </m:mrow>
                              </m:mfrac>
                              <m:mo>.</m:mo>
                           </m:mtd>
                        </m:mtr>
                     </m:mtable>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGceaabbeaacqWGqbaucqGGOaakieqacqWFcbGqcqGGPaqkcqGH9aqpdaaeqbqaaiabdcfaqjabcIcaOiab=jeacjabcYha8jab=5eaojabg2da9iabe27aUjabcMcaPiabdcfaqjabcIcaOiab=5eaojabg2da9iabe27aUjabcMcaPaWcbaGaeqyVd4gabeqdcqGHris5aaGcbaGaeyypa0ZaaabCaeaadaWadaqaaiabigdaXiabgkHiTmaabmaajuaGbaWaaSaaaeaacqaIXaqmaeaacqaIYaGmaaaakiaawIcacaGLPaaadaahaaWcbeqaaiabe27aUjabgkHiTiabigdaXaaaaOGaay5waiaaw2faaaWcbaGaeqyVd4Maeyypa0JaeGOmaidabaGaeyOhIukaniabggHiLdqcfa4aaSaaaeaacqWGLbqzdaahaaqabeaacqGHsislcqaHbpGCaaGaeqyWdi3aaWbaaeqabaGaeqyVd4gaaaqaaiabe27aUjabcgcaHaaakiabc6caUaaaaa@64E4@</m:annotation>
                  </m:semantics>
               </m:math>
            </display-formula>
         </p>
         <p>This expression represents the ideal probability of covering a diploid location as a function of the haploid sequence redundancy of the project.</p>
         <p>Eq. 10 is actually just a special case of our model in Eq. 2 for <it>&#966; </it>= 1, as the following exercise demonstrates.</p>
         <p>
            <display-formula id="M11">
               <m:math name="1471-2105-9-239-i15" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mi>P</m:mi>
                        <m:mo stretchy="false">(</m:mo>
                        <m:mi>B</m:mi>
                        <m:mo stretchy="false">)</m:mo>
                        <m:mo>=</m:mo>
                        <m:mstyle displaystyle="true">
                           <m:munderover>
                              <m:mo>&#8721;</m:mo>
                              <m:mrow>
                                 <m:mi>&#957;</m:mi>
                                 <m:mo>=</m:mo>
                                 <m:mn>0</m:mn>
                              </m:mrow>
                              <m:mi>&#8734;</m:mi>
                           </m:munderover>
                           <m:mrow>
                              <m:mrow>
                                 <m:mo>[</m:mo>
                                 <m:mrow>
                                    <m:mn>1</m:mn>
                                    <m:mo>&#8722;</m:mo>
                                    <m:msup>
                                       <m:mrow>
                                          <m:mrow>
                                             <m:mo>(</m:mo>
                                             <m:mrow>
                                                <m:mfrac>
                                                   <m:mn>1</m:mn>
                                                   <m:mn>2</m:mn>
                                                </m:mfrac>
                                             </m:mrow>
                                             <m:mo>)</m:mo>
                                          </m:mrow>
                                       </m:mrow>
                                       <m:mrow>
                                          <m:mi>&#957;</m:mi>
                                          <m:mo>&#8722;</m:mo>
                                          <m:mn>1</m:mn>
                                       </m:mrow>
                                    </m:msup>
                                 </m:mrow>
                                 <m:mo>]</m:mo>
                              </m:mrow>
                           </m:mrow>
                        </m:mstyle>
                        <m:mfrac>
                           <m:mrow>
                              <m:msup>
                                 <m:mi>e</m:mi>
                                 <m:mrow>
                                    <m:mo>&#8722;</m:mo>
                                    <m:mi>&#961;</m:mi>
                                 </m:mrow>
                              </m:msup>
                              <m:msup>
                                 <m:mi>&#961;</m:mi>
                                 <m:mi>&#957;</m:mi>
                              </m:msup>
                           </m:mrow>
                           <m:mrow>
                              <m:mi>&#957;</m:mi>
                              <m:mo>!</m:mo>
                           </m:mrow>
                        </m:mfrac>
                        <m:mo>&#8722;</m:mo>
                        <m:mstyle displaystyle="true">
                           <m:munderover>
                              <m:mo>&#8721;</m:mo>
                              <m:mrow>
                                 <m:mi>&#957;</m:mi>
                                 <m:mo>=</m:mo>
                                 <m:mn>0</m:mn>
                              </m:mrow>
                              <m:mn>1</m:mn>
                           </m:munderover>
                           <m:mrow>
                              <m:mrow>
                                 <m:mo>[</m:mo>
                                 <m:mrow>
                                    <m:mn>1</m:mn>
                                    <m:mo>&#8722;</m:mo>
                                    <m:msup>
                                       <m:mrow>
                                          <m:mrow>
                                             <m:mo>(</m:mo>
                                             <m:mrow>
                                                <m:mfrac>
                                                   <m:mn>1</m:mn>
                                                   <m:mn>2</m:mn>
                                                </m:mfrac>
                                             </m:mrow>
                                             <m:mo>)</m:mo>
                                          </m:mrow>
                                       </m:mrow>
                                       <m:mrow>
                                          <m:mi>&#957;</m:mi>
                                          <m:mo>&#8722;</m:mo>
                                          <m:mn>1</m:mn>
                                       </m:mrow>
                                    </m:msup>
                                 </m:mrow>
                                 <m:mo>]</m:mo>
                              </m:mrow>
                              <m:mfrac>
                                 <m:mrow>
                                    <m:msup>
                                       <m:mi>e</m:mi>
                                       <m:mrow>
                                          <m:mo>&#8722;</m:mo>
                                          <m:mi>&#961;</m:mi>
                                       </m:mrow>
                                    </m:msup>
                                    <m:msup>
                                       <m:mi>&#961;</m:mi>
                                       <m:mi>&#957;</m:mi>
                                    </m:msup>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mi>&#957;</m:mi>
                                    <m:mo>!</m:mo>
                                 </m:mrow>
                              </m:mfrac>
                           </m:mrow>
                        </m:mstyle>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH