<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>1471-2105-13-279</ui>
	<ji>1471-2105</ji>
	<fm>
		<dochead>Methodology article</dochead>
		<bibl>
			<title>
				<p>ChopSticks: High-resolution analysis of homozygous deletions by exploiting concordant read pairs</p>
			</title>
			<aug>
				<au id="A1" ca="yes"><snm>Yasuda</snm><fnm>Tomohiro</fnm><insr iid="I1"/><email>tyasuda@hgc.jp</email></au>
				<au id="A2"><snm>Suzuki</snm><fnm>Shin</fnm><insr iid="I1"/><email>shinout@hgc.jp</email></au>
				<au id="A3"><snm>Nagasaki</snm><fnm>Masao</fnm><insr iid="I2"/><email>nagasaki@megabank.tohoku.ac.jp</email></au>
				<au id="A4"><snm>Miyano</snm><fnm>Satoru</fnm><insr iid="I1"/><email>miyano@hgc.jp</email></au>
			</aug>
			<insg>
				<ins id="I1"><p>Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan</p></ins>
				<ins id="I2"><p>Department of Integrative Genomics, Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai 980-8573, Japan</p></ins>
			</insg>
			<source>BMC Bioinformatics</source>
			<section><title><p>Sequence analysis (methods)</p></title></section><issn>1471-2105</issn>
			<pubdate>2012</pubdate>
			<volume>13</volume>
			<issue>1</issue>
			<fpage>279</fpage>
			<url>http://www.biomedcentral.com/1471-2105/13/279</url>
			<xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-13-279</pubid><pubid idtype="pmpid">23110596</pubid></pubidlist></xrefbib>
		</bibl>
		<history><rec><date><day>7</day><month>4</month><year>2012</year></date></rec><acc><date><day>5</day><month>9</month><year>2012</year></date></acc><pub><date><day>30</day><month>10</month><year>2012</year></date></pub></history>
		<cpyrt><year>2012</year><collab>Yasuda et al.; licensee BioMed Central Ltd.</collab><note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<sec>
					<st>
						<p>Background</p>
					</st><p>Structural variations (SVs) in genomes are commonly observed even in healthy individuals and play key roles in biological functions. To understand their functional impact or to infer molecular mechanisms of SVs, they have to be characterized with the maximum resolution. However, high-resolution analysis is a difficult task because it requires investigation of the complex structures involved in an enormous number of alignments of next-generation sequencing (NGS) reads and genome sequences that contain errors.</p>
				</sec>
				<sec>
					<st>
						<p>Results</p>
					</st><p>We propose a new method called <it>ChopSticks</it> that improves the resolution of SV detection for homozygous deletions even when the depth of coverage is low. Conventional methods based on read pairs use only <it>discordant</it> pairs to localize the positions of deletions, where a discordant pair is a read pair whose alignment has an aberrant strand or distance. In contrast, our method exploits concordant reads as well. We theoretically proved that when the depth of coverage approaches zero or infinity, the expected resolution of our method is asymptotically equal to that of methods based only on discordant pairs under double coverage. To confirm the effectiveness of ChopSticks, we conducted computational experiments against both simulated NGS reads and real NGS sequences. The resolution of deletion calls by other methods was significantly improved, thus demonstrating the usefulness of ChopSticks.</p>
				</sec>
				<sec>
					<st>
						<p>Conclusions</p>
					</st><p>ChopSticks can generate high-resolution deletion calls of homozygous deletions using information independent of other methods, and it is therefore useful to examine the functional impact of SVs or to infer SV generation mechanisms.</p>
				</sec>
			</sec>
		</abs>
	</fm>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st><p>Today, next-generation sequencing (NGS) technologies are essential tools in genome analysis, because they enable us to simultaneously obtain sequences of up to hundreds of billions of base pairs <abbrgrp>
					<abbr bid="B1">1</abbr>
				</abbrgrp>. These technologies enable the characterization of not only small variations such as single-nucleotide polymorphisms (SNPs) but also large-scale mutations such as insertions, deletions, tandem duplications, and inversions. Mutations of these types are collectively called structural variations (SVs) and are frequently observed even in healthy individuals <abbrgrp>
					<abbr bid="B2">2</abbr>
					<abbr bid="B3">3</abbr>
					<abbr bid="B4">4</abbr>
				</abbrgrp>. Because SVs affect a much larger portion of genomes than small variations, including SNPs, they have a great impact on biological functions.</p><p>Current NGS methods can sequence paired reads, which are pairs of reads several hundred bases away from each other. This ability is useful for analyzing SVs because paired reads can be aligned with the reference genome more accurately than single reads, and because we can analyze structures of genomes larger than the size of each read. However, SV detection is still a difficult task, because it requires analysis of the complex structures involved in an enormous number of alignments of paired reads with the reference genome, and because read sequences and alignments include unavoidable errors. Therefore, for example, a false detection rate (FDR) up to 10% had to be tolerated even when determining just the existence of each SV in the 1000 Genomes Project <abbrgrp>
					<abbr bid="B2">2</abbr>
				</abbrgrp>. It is obviously more difficult to accurately detect the exact positions of SVs. Nevertheless, high-resolution SV calls are necessary to elucidate the functional impact of SVs and molecular mechanisms that generate SVs. Moreover, to conduct a large-scale analysis, SV detection methods for data with a low depth of coverage (hereafter simply referred to as <it>coverage</it>) are desirable, because whole genome sequencing is not easy even with NGS technologies.</p><p>Current methods for SV detection search for <it>signatures</it> that indicate SVs hidden in read sequences and their alignments with the genome sequences. The following are basic signatures used for SV detection <abbrgrp>
					<abbr bid="B2">2</abbr>
					<abbr bid="B3">3</abbr>
					<abbr bid="B4">4</abbr>
				</abbrgrp>. </p><p indent="1">&#8226; Read pair (RP) <abbrgrp>
					<abbr bid="B5">5</abbr>
					<abbr bid="B6">6</abbr>
					<abbr bid="B7">7</abbr>
				</abbrgrp>: If pairs of reads have aberrant strands or distances, they are likely to be caused by SVs. Such pairs are called <it>discordant</it> pairs, and normally mapped ones are called <it>concordant</it> pairs. If strands of a discordant pair are as expected, a larger distance than expected indicates a deletion, whereas a smaller distance indicates an insertion. There are several categories of methods that detect discordant pairs by using mapping distances. </p><p indent="2">&#8226; Threshold-based: A pair with a mapped distance larger or smaller than a predefined threshold is defined as a discordant pair. The threshold is <it>&#956;</it>&#177;3 <it>&#963;</it>or <it>&#956;</it>&#177;4 <it>&#963;</it>for BreakDancer <abbrgrp>
					<abbr bid="B5">5</abbr>
				</abbrgrp> and VariationHunter <abbrgrp>
					<abbr bid="B6">6</abbr>
				</abbrgrp> where <it>&#956;</it>and <it>&#963;</it>are mean and standard deviation of mapping distances, or median fragment size &#177; 10 median absolute deviations for HYDRA <abbrgrp>
					<abbr bid="B7">7</abbr>
				</abbrgrp>.</p><p indent="2">&#8226; Distribution-based: Although the mapped distance of a single pair might vary by tens or hundreds bases even without SVs, larger (smaller) mapping distances of many pairs in the same region indicate deletions (insertions). Such reads can be detected by statistical tests on the distribution of mapped distances <abbrgrp>
					<abbr bid="B5">5</abbr>
					<abbr bid="B8">8</abbr>
				</abbrgrp>. Pairs detected in this way might have mapping distances more similar to the expected distance than those of other methods. Nonetheless, we still call them <it>discordant</it> pairs in this paper to unify the word used to refer pairs that support SVs.</p><p indent="2">&#8226; Graph-based: Recently Marshall et al. <abbrgrp>
					<abbr bid="B9">9</abbr>
				</abbrgrp> proposed a new method CLEVER based on the graph theory. CLEVER constructs a graph where a node represents an alignment of a read pair and the genome, while an edge means that connected alignments potentially support the same allele. In this graph, a clique corresponds to a set of pairs supporting the same allele. CLEVER detects SVs by finding maximal cliques (max-cliques). CLEVER has an ability to find more than one max-clique overlaping each other, each of which supports a different allele. Therefore CLEVER can distinguish more than one SV located at the same locus, for example, two deletions of different sizes in a diploid genome.</p><p indent="1">&#8226; Read depth (RD) <abbrgrp>
					<abbr bid="B10">10</abbr>
					<abbr bid="B11">11</abbr>
				</abbrgrp>: If coverage changes at some position in the genome, this indicates a copy number variation.</p><p indent="1">&#8226; Split read (SR) <abbrgrp>
					<abbr bid="B12">12</abbr>
				</abbrgrp>: If an alignment of a read and the genome includes only a part of the read, this indicates a position of a breakpoint. Here, a <it>breakpoint</it> is the boundary between a region affected by some SV and its unaffected flanking region.</p><p indent="1">&#8226; Sequence assembly (AS) <abbrgrp>
					<abbr bid="B7">7</abbr>
					<abbr bid="B13">13</abbr>
				</abbrgrp>: If the coverage is sufficient, assembling NGS reads around an SV reveals the exact sequence around the SV and the positions of breakpoints.</p><p>The most popular signature used to detect SVs is threshold-based RP. Methods based on this signature can detect SVs from a small number of discordant read pairs; therefore threshold-based RP methods can be applied to low-coverage data. However, threshold-based RP methods localize SVs only to regions surrounded by discordant read pairs, thus causing some ambiguity. For RD methods, the problem of resolution is much bigger. Because RD methods involve calculation of coverage in windows of a fixed size, its resolution cannot be finer than the window size. Methods based on the SR signature can determine positions of breakpoints up to base-pair-level (bp-level) resolution if there are reads covering the breakpoints. However, such reads might not exist, in particular when coverage is low, because of unevenness of coverage or repeat elements to which reads cannot be aligned uniquely. Moreover, because such a split alignment is shorter than a read itself, careful analysis is required to avoid spurious matches. If coverage is sufficiently high, AS methods would ultimately reveal the exact positions of SVs at bp-level resolution. Although extremely deep sequencing can be conducted by targeted sequencing <abbrgrp>
					<abbr bid="B14">14</abbr>
				</abbrgrp>, it is still expensive to obtain paired reads of high coverage over the entire genome so that assembly can be performed. In fact, a previous study has indicated that the sensitivity of AS methods is rather low (Table S6B of Mills et al. <abbrgrp>
					<abbr bid="B3">3</abbr>
				</abbrgrp>).</p><p>Because these signatures have their own advantages and disadvantages, it is desirable to combine more than one method <abbrgrp>
					<abbr bid="B4">4</abbr>
				</abbrgrp>. In fact, several methods that use more than one signature have been proposed recently <abbrgrp>
					<abbr bid="B15">15</abbr>
					<abbr bid="B16">16</abbr>
				</abbrgrp>. In combined approaches, we should integrate SV signatures that are independent of each other. In this paper, we propose a new method called <it>ChopSticks</it> that improves the resolution of deletion calls for homozygous deletions generated mainly by threshold-based RP methods. ChopSticks is especially valuable when target SVs are expected to be homozygous as those of inbred mice whose genomes are homozygous at virtually all loci <abbrgrp>
					<abbr bid="B17">17</abbr>
				</abbrgrp>. ChopSticks exploits positions of concordant read pairs in addition to those of discordant ones. Thus far, they have been ignored in threshold-based RP approaches, and therefore, our method can improve the resolution by using this new independent information. As explained below, ChopSticks is effective even for data whose coverage is low.</p><p>The organization of this paper is as follows. First, we theoretically analyze the improvement of the resolution achieved by exploiting concordant read pairs. Next, we present our computational method ChopSticks that improves the resolution of homozygous deletion calls. After that, we demonstrate the effectiveness of ChopSticks in computational experiments. Then, we present our conclusions. In addition, we illustrate details of our method and experiments in Methods section.</p>
		</sec>
		<sec>
			<st>
				<p>Results and discussion</p>
			</st>
			<sec>
				<st>
					<p>Strategy for resolution improvement</p>
				</st>
				<sec>
					<st>
						<p>Theoretical estimation of resolution</p>
					</st><p>Here we present results of our theoretical analysis of improved resolution achieved by our method as compared to RP methods. We also present the necessary definitions to describe them. See Methods for details.</p><p>We define a <it>discordant read</it> as a read of a discordant pair and a <it>concordant read</it> as that of a concordant pair. Among the two reads of a pair, the one mapped upstream is called an <it>upstream read</it> and the other is called a <it>downstream read</it> in this paper. Let <it>c</it> be the depth of coverage. Assume that the positions of read pairs are uniformly random over the genome, and that the length <it>r</it> of each read is a fixed constant. Let <it>q</it>(<it>c</it>) be the probability that there is no read pair whose upstream read begins at a given base in the genome. Suppose that there are <it>N</it> read pairs uniquely mapped to a genomic sequence of length <it>G</it>. According to a classical analysis <abbrgrp>
							<abbr bid="B18">18</abbr>
						</abbrgrp>, </p><p>
						<display-formula id="M1">
							<m:math name="1471-2105-13-279-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mtable columnalign="left">
   <m:mtr>
      <m:mtd>
         <m:mi>q</m:mi>
         <m:mo stretchy="false">(</m:mo>
         <m:mi>c</m:mi>
         <m:mo stretchy="false">)</m:mo>
         <m:mo>=</m:mo>
         <m:msup>
            <m:mrow>
               <m:mfenced separators="" open="(" close=")">
                  <m:mrow>
                     <m:mn>1</m:mn>
                     <m:mo>&#8722;</m:mo>
                     <m:mfrac>
                        <m:mrow>
                           <m:mn>1</m:mn>
                        </m:mrow>
                        <m:mrow>
                           <m:mi>G</m:mi>
                        </m:mrow>
                     </m:mfrac>
                  </m:mrow>
               </m:mfenced>
            </m:mrow>
            <m:mrow>
               <m:mi>N</m:mi>
            </m:mrow>
         </m:msup>
         <m:mo>&#8776;</m:mo>
         <m:msup>
            <m:mrow>
               <m:mi>e</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mo>&#8722;</m:mo>
               <m:mi>N</m:mi>
               <m:mo>/</m:mo>
               <m:mi>G</m:mi>
            </m:mrow>
         </m:msup>
         <m:mo>=</m:mo>
         <m:msup>
            <m:mrow>
               <m:mi>e</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mo>&#8722;</m:mo>
               <m:mi>c</m:mi>
               <m:mo>/</m:mo>
               <m:mn>2</m:mn>
               <m:mi>r</m:mi>
            </m:mrow>
         </m:msup>
         <m:mi>.</m:mi>
      </m:mtd>
   </m:mtr>
</m:mtable>
</m:math>
						</display-formula>
					</p><p>Hereafter, we just write <it>q</it> instead of <it>q</it>(<it>c</it>) for simplicity. In threshold-based RP approaches, the predicted position of an upstream end of a deletion is determined by the upstream discordant read that is the closest to the breakpoint. Let <it>b</it> be the position of an upstream end of a deletion, <it>&#916;</it>
						<sub>
							<it>b</it>
						</sub> be the distance between <it>b</it> and the closest upstream discordant read, and <it>d</it> be the distance between paired reads. We assume that <it>d</it> is a constant. Let <it>E</it>[<it>&#916;</it>
						<sub>
							<it>b</it>
						</sub>|<it>b</it>,<it>c</it>] be the expectation of <it>&#916;</it>
						<sub>
							<it>b</it>
						</sub>given that <it>b</it> is detected and the coverage is <it>c</it>. Then, </p><p>
						<display-formula id="M2">
							<m:math name="1471-2105-13-279-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mtable columnalign="left">
   <m:mtr>
      <m:mtd>
         <m:mi>E</m:mi>
         <m:mo>[</m:mo>
         <m:msub>
            <m:mrow>
               <m:mi>&#916;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>b</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo>|</m:mo>
         <m:mi>b</m:mi>
         <m:mo>,</m:mo>
         <m:mi>c</m:mi>
         <m:mo>]</m:mo>
         <m:mo>=</m:mo>
         <m:mfrac>
            <m:mrow>
               <m:mn>1</m:mn>
               <m:mo>&#8722;</m:mo>
               <m:mi>q</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>1</m:mn>
               <m:mo>&#8722;</m:mo>
               <m:msup>
                  <m:mrow>
                     <m:mi>q</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>d</m:mi>
                     <m:mo>+</m:mo>
                     <m:mn>1</m:mn>
                  </m:mrow>
               </m:msup>
            </m:mrow>
         </m:mfrac>
         <m:mi>S</m:mi>
         <m:mo stretchy="false">(</m:mo>
         <m:mi>q</m:mi>
         <m:mo>,</m:mo>
         <m:mi>d</m:mi>
         <m:mo stretchy="false">)</m:mo>
         <m:mo>,</m:mo>
      </m:mtd>
   </m:mtr>
</m:mtable>
</m:math>
						</display-formula>
					</p><p>where </p><p>
						<display-formula>
							<m:math name="1471-2105-13-279-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>S</m:mi>
   <m:mo stretchy="false">(</m:mo>
   <m:mi>q</m:mi>
   <m:mo>,</m:mo>
   <m:mi>d</m:mi>
   <m:mo stretchy="false">)</m:mo>
   <m:mo>=</m:mo>
   <m:munderover accentunder="false" accent="false">
      <m:mrow>
         <m:mi mathsize="big">&#8721;</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>j</m:mi>
         <m:mo>=</m:mo>
         <m:mn>0</m:mn>
      </m:mrow>
      <m:mrow>
         <m:mi>d</m:mi>
      </m:mrow>
   </m:munderover>
   <m:mi>j</m:mi>
   <m:msup>
      <m:mrow>
         <m:mi>q</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>j</m:mi>
      </m:mrow>
   </m:msup>
   <m:mo>=</m:mo>
   <m:mfrac>
      <m:mrow>
         <m:mi>q</m:mi>
         <m:mo>&#8722;</m:mo>
         <m:mo stretchy="false">(</m:mo>
         <m:mi>d</m:mi>
         <m:mo>+</m:mo>
         <m:mn>1</m:mn>
         <m:mo stretchy="false">)</m:mo>
         <m:msup>
            <m:mrow>
               <m:mi>q</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>d</m:mi>
               <m:mo>+</m:mo>
               <m:mn>1</m:mn>
            </m:mrow>
         </m:msup>
         <m:mo>+</m:mo>
         <m:mi>d</m:mi>
         <m:msup>
            <m:mrow>
               <m:mi>q</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>d</m:mi>
               <m:mo>+</m:mo>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msup>
      </m:mrow>
      <m:mrow>
         <m:msup>
            <m:mrow>
               <m:mo stretchy="false">(</m:mo>
               <m:mn>1</m:mn>
               <m:mo>&#8722;</m:mo>
               <m:mi>q</m:mi>
               <m:mo stretchy="false">)</m:mo>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msup>
      </m:mrow>
   </m:mfrac>
   <m:mi>.</m:mi>
</m:mrow>
</m:math>
						</display-formula>
					</p><p>See Methods for derivation of Equation (2). We can obtain better resolution by using concordant reads in addition to discordant reads, because there is a chance that there exists a concordant read closer to <it>b</it> than any upstream discordant read (Figure <figr fid="F1">1</figr>). Such a read can contribute to the localization of the position where <it>b</it> can exist. Let <inline-formula>
							<m:math name="1471-2105-13-279-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:msubsup>
   <m:mrow>
      <m:mi>&#916;</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mi>b</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mo>&#8242;</m:mo>
   </m:mrow>
</m:msubsup>
</m:math>
						</inline-formula> be the distance between <it>b</it> and the closest read in the upstream of <it>b</it>, and let <inline-formula>
							<m:math name="1471-2105-13-279-i5" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mi>E</m:mi>
<m:mo>[</m:mo>
<m:msubsup>
   <m:mrow>
      <m:mi>&#916;</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mi>b</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mo>&#8242;</m:mo>
   </m:mrow>
</m:msubsup>
<m:mo>|</m:mo>
<m:mi>b</m:mi>
<m:mo>,</m:mo>
<m:mi>c</m:mi>
<m:mo>]</m:mo>
</m:math>
						</inline-formula> be the expectation of <inline-formula>
							<m:math name="1471-2105-13-279-i6" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:msubsup>
   <m:mrow>
      <m:mi>&#916;</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mi>b</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mo>&#8242;</m:mo>
   </m:mrow>
</m:msubsup>
</m:math>
						</inline-formula> given that <it>b</it> is detected and the coverage is <it>c</it>. Then, </p><p>
						<display-formula id="M3">
							<m:math name="1471-2105-13-279-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mtable columnalign="left">
   <m:mtr>
      <m:mtd class="align-1">
         <m:mspace width="-12.0pt"/>
         <m:mi>E</m:mi>
         <m:mo>[</m:mo>
         <m:msubsup>
            <m:mrow>
               <m:mi>&#916;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>b</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mo>&#8242;</m:mo>
            </m:mrow>
         </m:msubsup>
         <m:mo>|</m:mo>
         <m:mi>b</m:mi>
         <m:mo>,</m:mo>
         <m:mi>c</m:mi>
         <m:mo>]</m:mo>
         <m:mo>=</m:mo>
      </m:mtd>
      <m:mtd class="align-2">
         <m:mfrac>
            <m:mrow>
               <m:mn>1</m:mn>
            </m:mrow>
            <m:mrow>
               <m:mn>1</m:mn>
               <m:mo>&#8722;</m:mo>
               <m:msup>
                  <m:mrow>
                     <m:mi>q</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>d</m:mi>
                     <m:mo>+</m:mo>
                     <m:mn>1</m:mn>
                  </m:mrow>
               </m:msup>
            </m:mrow>
         </m:mfrac>
         <m:mspace width="2em"/>
      </m:mtd>
      <m:mtd>
         <m:mspace width="2em"/>
      </m:mtd>
   </m:mtr>
   <m:mtr>
      <m:mtd class="align-1"/>
      <m:mtd class="align-2">
         <m:mo>&#215;</m:mo>
         <m:mfenced separators="" open="(" close=")">
            <m:mrow>
               <m:mo stretchy="false">(</m:mo>
               <m:mn>1</m:mn>
               <m:mo>&#8722;</m:mo>
               <m:msup>
                  <m:mrow>
                     <m:mi>q</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mn>2</m:mn>
                  </m:mrow>
               </m:msup>
               <m:mo stretchy="false">)</m:mo>
               <m:mi>S</m:mi>
               <m:mo stretchy="false">(</m:mo>
               <m:msup>
                  <m:mrow>
                     <m:mi>q</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mn>2</m:mn>
                  </m:mrow>
               </m:msup>
               <m:mo>,</m:mo>
               <m:mi>d</m:mi>
               <m:mo stretchy="false">)</m:mo>
               <m:mo>&#8722;</m:mo>
               <m:msup>
                  <m:mrow>
                     <m:mi>q</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>d</m:mi>
                     <m:mo>+</m:mo>
                     <m:mn>1</m:mn>
                  </m:mrow>
               </m:msup>
               <m:mo stretchy="false">(</m:mo>
               <m:mn>1</m:mn>
               <m:mo>&#8722;</m:mo>
               <m:mi>q</m:mi>
               <m:mo stretchy="false">)</m:mo>
               <m:mi>S</m:mi>
               <m:mo stretchy="false">(</m:mo>
               <m:mi>q</m:mi>
               <m:mo>,</m:mo>
               <m:mi>d</m:mi>
               <m:mo stretchy="false">)</m:mo>
            </m:mrow>
         </m:mfenced>
         <m:mi>.</m:mi>
         <m:mspace width="2em"/>
      </m:mtd>
   </m:mtr>
</m:mtable>
</m:math>
						</display-formula>
					</p>
					<fig id="F1"><title><p>Figure 1</p></title><caption><p>Resolution improvement by exploiting concordant read pairs</p></caption><text>
   <p><b>Resolution improvement by exploiting concordant read pairs.</b> Schematic illustration of the key idea of our method ChopSticks. Unlike conventional SV detection methods based only on discordant pairs whose mapping distances were not close to the expectation, ChopSticks uses concordant read pairs as well. There is a chance that there is a concordant read closer to the boundary of the deleted region (breakpoint) than any discordant reads. Such a concordant read localizes the predicted position of the breakpoint, and therefore it contributes to achieving a high resolution. In this figure, <it>b</it> is the upstream end of a true deletion, <it>&#916;</it><sub><it>b </it></sub>is the distance between the upstream end of a true deletion and that of a deletion call by threshold-based read-pair (RP) methods. Similarly, <inline-formula><m:math name="1471-2105-13-279-i8" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:msubsup>
   <m:mrow>
      <m:mi>&#916;</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mi>b</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mo>&#8242;</m:mo>
   </m:mrow>
</m:msubsup>
</m:math></inline-formula> is defined for our method. The expected values of <it>&#916;</it><sub><it>b</it></sub>and <inline-formula><m:math name="1471-2105-13-279-i9" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:msubsup>
   <m:mrow>
      <m:mi>&#916;</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mi>b</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mo>&#8242;</m:mo>
   </m:mrow>
</m:msubsup>
</m:math></inline-formula> are given by Equations (2) and (3), respectively.</p>
</text><graphic file="1471-2105-13-279-1"/></fig><p>As shown in Figure <figr fid="F2">2</figr>, the expected resolution of our method is significantly superior to that of threshold-based RP methods, which only use discordant pairs. The achieved resolution is quite close to that of threshold-based RP methods but with double coverage, which we confirmed theoretically.</p>
					<fig id="F2"><title><p>Figure 2</p></title><caption><p>Expected resolutions of ChopSticks and threshold-based RP methods</p></caption><text>
   <p><b>Expected resolutions of ChopSticks and threshold-based RP methods.</b> The expected resolution of our method (<inline-formula><m:math name="1471-2105-13-279-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mi>E</m:mi>
<m:mo>[</m:mo>
<m:msubsup>
   <m:mrow>
      <m:mi>&#916;</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mi>b</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mo>&#8242;</m:mo>
   </m:mrow>
</m:msubsup>
<m:mo>|</m:mo>
<m:mi>b</m:mi>
<m:mo>,</m:mo>
<m:mi>c</m:mi>
<m:mo>]</m:mo>
</m:math></inline-formula>) is shown by a thick red line, that of threshold-based RP methods (<it>E</it>[<it>&#916;</it><sub><it>b</it></sub>|<it>b</it>,<it>c</it>]) is shown by a thin solid black line, and that of threshold-based RP methods with double coverage (<it>E</it>[<it>&#916;</it><sub><it>b</it></sub>|<it>b</it>,2<it>c</it>]) is shown by a dashed black line. The difference between <inline-formula><m:math name="1471-2105-13-279-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mi>E</m:mi>
<m:mo>[</m:mo>
<m:msubsup>
   <m:mrow>
      <m:mi>&#916;</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mi>b</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mo>&#8242;</m:mo>
   </m:mrow>
</m:msubsup>
<m:mo>|</m:mo>
<m:mi>b</m:mi>
<m:mo>,</m:mo>
<m:mi>c</m:mi>
<m:mo>]</m:mo>
</m:math></inline-formula> and <it>E</it>[<it>&#916;</it><sub><it>b</it></sub>|<it>b</it>,2<it>c</it>] is also shown by a dotted blue line. As the coverage goes away from zero, the resolution obtained by our method quickly outperforms that of normal RP methods. It is also clear that the resolution of our method is very close to that of threshold-based RP methods with double coverage. The difference approaches zero when coverage approaches zero or infinity, as indicated by the blue dotted line. <it>E</it>[<it>&#916;</it><sub><it>b</it></sub>|<it>b</it>,<it>c</it>], <inline-formula><m:math name="1471-2105-13-279-i12" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mi>E</m:mi>
<m:mo>[</m:mo>
<m:msubsup>
   <m:mrow>
      <m:mi>&#916;</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mi>b</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mo>&#8242;</m:mo>
   </m:mrow>
</m:msubsup>
<m:mo>|</m:mo>
<m:mi>b</m:mi>
<m:mo>,</m:mo>
<m:mi>c</m:mi>
<m:mo>]</m:mo>
</m:math></inline-formula>, and <it>E</it>[<it>&#916;</it><sub><it>b</it></sub>|<it>b</it>,2<it>c</it>] are given by Equations (2), (3), and (5), respectively. In this figure, <it>d</it>=200 and <it>r</it>=100.</p>
</text><graphic file="1471-2105-13-279-2"/></fig>
					<sec>
						<st>
							<p>Theorem 1</p>
						</st><p>The expectation <inline-formula>
								<m:math name="1471-2105-13-279-i13" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mi>E</m:mi>
<m:mo>[</m:mo>
<m:msubsup>
   <m:mrow>
      <m:mi>&#916;</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mi>b</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mo>&#8242;</m:mo>
   </m:mrow>
</m:msubsup>
<m:mo>|</m:mo>
<m:mi>b</m:mi>
<m:mo>,</m:mo>
<m:mi>c</m:mi>
<m:mo>]</m:mo>
</m:math>
							</inline-formula> is a weighted sum of <it>E</it>[<it>&#916;</it>
							<sub>
								<it>b</it>
							</sub>|<it>b</it>,2<it>c</it>] and <it>E</it>[<it>&#916;</it>
							<sub>
								<it>b</it>
							</sub>|<it>b</it>,<it>c</it>]. To be more precise, the following equation holds: </p><p>
							<display-formula id="M4">
								<m:math name="1471-2105-13-279-i14" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mtable columnalign="left">
   <m:mtr>
      <m:mtd>
         <m:mi>E</m:mi>
         <m:mo>[</m:mo>
         <m:msubsup>
            <m:mrow>
               <m:mi>&#916;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>b</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mo>&#8242;</m:mo>
            </m:mrow>
         </m:msubsup>
         <m:mo>|</m:mo>
         <m:mi>b</m:mi>
         <m:mo>,</m:mo>
         <m:mi>c</m:mi>
         <m:mo>]</m:mo>
         <m:mo>=</m:mo>
         <m:mo stretchy="false">(</m:mo>
         <m:mn>1</m:mn>
         <m:mo>+</m:mo>
         <m:msup>
            <m:mrow>
               <m:mi>q</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>d</m:mi>
               <m:mo>+</m:mo>
               <m:mn>1</m:mn>
            </m:mrow>
         </m:msup>
         <m:mo stretchy="false">)</m:mo>
         <m:mi>E</m:mi>
         <m:mo>[</m:mo>
         <m:msub>
            <m:mrow>
               <m:mi>&#916;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>b</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo>|</m:mo>
         <m:mi>b</m:mi>
         <m:mo>,</m:mo>
         <m:mn>2</m:mn>
         <m:mi>c</m:mi>
         <m:mo>]</m:mo>
         <m:mo>&#8722;</m:mo>
         <m:msup>
            <m:mrow>
               <m:mi>q</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>d</m:mi>
               <m:mo>+</m:mo>
               <m:mn>1</m:mn>
            </m:mrow>
         </m:msup>
         <m:mi>E</m:mi>
         <m:mo>[</m:mo>
         <m:msub>
            <m:mrow>
               <m:mi>&#916;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>b</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo>|</m:mo>
         <m:mi>b</m:mi>
         <m:mo>,</m:mo>
         <m:mi>c</m:mi>
         <m:mo>]</m:mo>
         <m:mi>.</m:mi>
      </m:mtd>
   </m:mtr>
</m:mtable>
</m:math>
							</display-formula>
						</p>
					</sec><p>See Methods for the proof. When <it>c</it>&#8594;0, both <it>E</it>[<it>&#916;</it>
						<sub>
							<it>b</it>
						</sub>|<it>b</it>,2<it>c</it>] and <it>E</it>[<it>&#916;</it>
						<sub>
							<it>b</it>
						</sub>|<it>b</it>,<it>c</it>] approach <it>d</it>/2, which is the expected resolution when a deletion is detected with only one read pair. Therefore <inline-formula>
							<m:math name="1471-2105-13-279-i15" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mi>E</m:mi>
<m:mo>[</m:mo>
<m:msubsup>
   <m:mrow>
      <m:mi>&#916;</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mi>b</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mo>&#8242;</m:mo>
   </m:mrow>
</m:msubsup>
<m:mo>|</m:mo>
<m:mi>b</m:mi>
<m:mo>,</m:mo>
<m:mi>c</m:mi>
<m:mo>]</m:mo>
</m:math>
						</inline-formula> also approaches <it>d</it>/2 when <it>c</it>&#8594;0. On the other hand, when <it>c</it> approaches infinity, <inline-formula>
							<m:math name="1471-2105-13-279-i16" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mi>E</m:mi>
<m:mo>[</m:mo>
<m:msubsup>
   <m:mrow>
      <m:mi>&#916;</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mi>b</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mo>&#8242;</m:mo>
   </m:mrow>
</m:msubsup>
<m:mo>|</m:mo>
<m:mi>b</m:mi>
<m:mo>,</m:mo>
<m:mn>2</m:mn>
<m:mi>c</m:mi>
<m:mo>]</m:mo>
</m:math>
						</inline-formula> approaches <it>E</it>[<it>&#916;</it>
						<sub>
							<it>b</it>
						</sub>|<it>b</it>,2<it>c</it>] because <it>q</it>
						<sup>
							<it>d</it> + 1</sup>&#8594;0. In summary,</p>
					<sec>
						<st>
							<p>Theorem 2</p>
						</st><p>
							<inline-formula>
								<m:math name="1471-2105-13-279-i17" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mi>E</m:mi>
<m:mo>[</m:mo>
<m:msubsup>
   <m:mrow>
      <m:mi>&#916;</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mi>b</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mo>&#8242;</m:mo>
   </m:mrow>
</m:msubsup>
<m:mo>|</m:mo>
<m:mi>b</m:mi>
<m:mo>,</m:mo>
<m:mi>c</m:mi>
<m:mo>]</m:mo>
</m:math>
							</inline-formula> is asymptotically equal to <it>E</it>[<it>&#916;</it>
							<sub>
								<it>b</it>
							</sub>|<it>b</it>,2<it>c</it>] when <it>c</it>&#8594;0 or <it>c</it>&#8594;<it>&#8734;</it>.</p>
					</sec>
				</sec>
				<sec>
					<st>
						<p>Trimming of deletion calls to improve resolution</p>
					</st><p>If all regions existing in the reference genome were covered by at least one read and there were absolutely no reads mapped to regions of homozygous deletions, the resolution of deletion calls could be quite easily improved by just trimming the ends of deletion calls that are covered by alignments of reads. Obviously, such a simple assumption does not hold in practical situations. First, coverage might be zero even in regions that actually exist in the genome, because no reads are obtained therein owing to the unevenness of the coverage or because reads cannot be uniquely mapped owing to repeat elements. Second, there might exist erroneous alignments in deleted regions because of incidental sequence similarity. Therefore, we developed the algorithm ChopSticks to carefully trim the ends of deletion calls (Figure <figr fid="F3">3</figr>). ChopSticks recognizes high-coverage regions close to the ends of deletion calls even if they are fragmented, and it repeatedly excludes the high-coverage regions from deletion calls. ChopSticks uses two parameters, <it>k</it> and <it>f</it>. The <it>k</it> parameter is a threshold used to distinguish high-coverage regions from low-coverage ones, and <it>f</it> determines the threshold of joint coverage of regions excluded from a deletion call. See Methods for details. Our implementation of ChopSticks is available on the Internet <abbrgrp>
							<abbr bid="B19">19</abbr>
						</abbrgrp>.</p>
					<fig id="F3"><title><p>Figure 3</p></title><caption><p>Overview of trimming algorithm of ChopSticks</p></caption><text>
   <p><b>Overview of trimming algorithm of ChopSticks.</b> Schematic illustration of the trimming algorithm of ChopSticks. ChopSticks trims ends of deletion calls that are not likely to be parts of deletions, according to their coverage. First, it trims high-coverage regions at the ends of deletion calls. Here, a <it>high-coverage region</it> is a region whose coverage is greater than a given parameter <it>k</it>. Second, it recognizes a high-coverage region separated by a low-coverage region and trims these regions if their joint coverage is deeper than <it>kf</it>, where <it>f</it> is another parameter. The second step is repeatedly conducted until the joint coverage becomes less than <it>kf</it>
							.</p>
</text><graphic file="1471-2105-13-279-3"/></fig>
				</sec>
			</sec>
			<sec>
				<st>
					<p>Computational experiment</p>
				</st><p>To evaluate the power of ChopSticks in improving the resolution of deletion calls, we conducted computational experiments. Let the <it>upstream difference</it> of a deletion call be <it>x</it>&#8722;<it>y</it>, where <it>x</it> is the position of the upstream end of the true deletion and <it>y</it> be that of the deletion call. Similarly, let the <it>downstream difference</it> of a deletion call be <it>y</it>
					<sup>&#8242;</sup>&#8722;<it>x</it>
					<sup>&#8242;</sup>, where <it>x</it>
					<sup>&#8242;</sup> is the position of the downstream end of the true deletion and <it>y</it>
					<sup>&#8242;</sup>is that of the deletion call. By definition, the closer to zero a difference is, the better. A positive difference value indicates that the called breakpoint is outside the true deletion, whereas a negative value indicates that it is inside the true deletion. To evaluate ChopSticks, the results of ChopSticks have to be compared with the positions of true deletions. Therefore we need NGS reads of a genome whose SVs against the reference genome are known up to bp-level resolution. We conducted two experiments described below.</p>
				<sec>
					<st>
						<p>Simulated reads</p>
					</st><p>In the first experiment, we evaluated ChopSticks with simulated NGS reads for which all SVs were known up to bp-level resolution. To obtain data as realistic as possible, we generated a genome sequence with SVs and simulated NGS sequences by using SV annotations published by Quinlan et al. <abbrgrp>
							<abbr bid="B7">7</abbr>
						</abbrgrp>. The accession number of the SV annotations is [dbVar:nstd19]. First, we deleted regions of the reference genome sequence that were annotated as deletions by Quinlan et al. Next, we inserted random fragments whose number and distribution of lengths were the same as annotated deletions, assuming that deletions and insertions are symmetric. Then, we introduced single nucleotide substitutions into the simulated genome sequence and generated paired reads from it. We conducted this simulation and evaluation of ChopSticks for chromosome 1 of the reference mouse genome mm9. All paired reads were mapped to mm9 using Burrows-Wheeler aligner (BWA) <abbrgrp>
							<abbr bid="B20">20</abbr>
						</abbrgrp>. Then we conducted SV analysis by using SV detection tools from each of categories described in the Background section: BreakDancer <abbrgrp>
							<abbr bid="B5">5</abbr>
						</abbrgrp> of threshold-based RP methods, MoDIL <abbrgrp>
							<abbr bid="B8">8</abbr>
						</abbrgrp> of distribution-based RP methods, CLEVER <abbrgrp>
							<abbr bid="B9">9</abbr>
						</abbrgrp> of graph-based RP methods, CNVnator <abbrgrp>
							<abbr bid="B11">11</abbr>
						</abbrgrp> of RD methods, and Pindel <abbrgrp>
							<abbr bid="B12">12</abbr>
						</abbrgrp> of SR methods. After that, we applied ChopSticks to their results.</p><p>Before applying ChopSticks, we examined the ability of SV detection tools to detect 460 deletions in chromosome 1 of the simulated mouse genome. We say that a deletion call is <it>correct</it> if it overlaps exactly one true deletion while the true deletion in turn overlaps exactly one deletion call. We show the number of called and correct SV calls in Table <tblr tid="T1">1</tblr>. We also show their <it>recall</it> (the number of correct deletion calls divided by the number of true deletions) and <it>precision</it> (the number of correct deletion calls divided by the number of all deletion calls) in Figure <figr fid="F4">4</figr>. The recall of BreakDancer and CLEVER was relatively good for all of tried coverage values, whereas the recall of Pindel was satisfactory only when coverage was high. The recall of MoDIL was low for all coverage values tried. Although almost all deletions called by these methods were correct, CNVnator generated numerous false positives (Table <tblr tid="T1">1</tblr>). Because ChopSticks is developed to correct breakpoints outside true deletions, we counted the number of deletion calls that cover the whole of true deletions. As shown in Figure <figr fid="F5">5</figr>, most of the deletion calls by MoDIL, CNVnator, and Pindel covered the whole of true deletions. However, a significant portion of BreakDancer and CLEVER results did not cover the whole of true deletions. Note that ChopSticks is harmless to these deletion calls because ChopSticks does not trim them when there are no alignments in true deletions.</p>
					<fig id="F4"><title><p>Figure 4</p></title><caption><p>Recall and precision of results of SV detection tools</p></caption><text>
   <p><b>Recall and precision of results of SV detection tools.</b> BreakDancer and CLEVER achieved relatively good recall for all coverage, while recall of MoDIL was low. Although recall of CNVnator was not bad, its precision was low. The recall of an SR method Pindel was good when coverage was high, but it was insufficient when coverage was low.</p>
</text><graphic file="1471-2105-13-279-4"/></fig>
					<fig id="F5"><title><p>Figure 5</p></title><caption><p>Number of deletion calls covering the whole of true deletions</p></caption><text>
   <p><b>Number of deletion calls covering the whole of true deletions.</b> Solid lines and circles show the number of all deletion calls generated by each tool, whereas dashed lines and &#8216;+&#8217; symbol s show the number of deletion calls covering the whole of true deletions. Most of the deletion calls of MoDIL, CNVnator (expanded by the window size), and Pindel covered the whole of true deletions. On the other hand, many CLEVER results did not always contain the whole of true deletions, while median of the distribution of predicted breakpoints was close to the true breakpoints as shown in Figure <figr fid="F10">10</figr>. BreakDancer results for high coverage data did not always contain true deletions either. Predicted breakpoints of BreakDancer approached true breakpoints as the depth of coverage increases, and sometimes intruded into true deletions when coverage was high.</p>
</text><graphic file="1471-2105-13-279-5"/></fig>
					<table id="T1">
						<title>
							<p>Table 1</p>
						</title>
						<caption>
							<p>
								<b>Results of SV detection obtained by BreakDancer, MoDIL, CLEVER, CNVnator, and Pindel</b>
							</p>
						</caption>
						<tgroup cols="6">
							<colspec align="left" colname="c1" colnum="1" colwidth="16*"/>
							<colspec align="center" colname="c2" colnum="2" colwidth="16*"/>
							<colspec align="center" colname="c3" colnum="3" colwidth="16*"/>
							<colspec align="center" colname="c4" colnum="4" colwidth="16*"/>
							<colspec align="center" colname="c5" colnum="5" colwidth="16*"/>
							<colspec align="center" colname="c6" colnum="6" colwidth="16*"/>
							<thead valign="top">
								<row>
									<entry align="left" colname="c1"/>
									<entry align="center" colname="c2" nameend="c5" namest="c2" rowsep="1">
										<p>
											<b>Depth of coverage</b>
										</p>
									</entry>
								</row>
								<row rowsep="1">
									<entry align="left" colname="c1">
										<p>
											<b>SV caller</b>
										</p>
									</entry>
									<entry align="center" colname="c2">
										<p>
											<b>2</b>
										</p>
									</entry>
									<entry align="center" colname="c3">
										<p>
											<b>5</b>
										</p>
									</entry>
									<entry align="center" colname="c4">
										<p>
											<b>10</b>
										</p>
									</entry>
									<entry align="center" colname="c5">
										<p>
											<b>15</b>
										</p>
									</entry>
									<entry align="center" colname="c6">
										<p>
											<b>20</b>
										</p>
									</entry>
								</row>
							</thead>
							<tfoot>
								<p>Results of deletion calls by BreakDancer, MoDIL, CLEVER, CNVnator, and Pindel. The values to the left of &#8221;/&#8221; are the numbers of <it>correct</it> deletion calls, where a <it>correct</it> deletion call is the one that overlaps with exactly one true deletion, which, in turn, only overlaps with the deletion call; the values to the right of &#8221;/&#8221; are the numbers of all deletion calls. BreakDancer and CLEVER results were good in both sensitivity and specificity. CNVnator generated numerous false positives, while Pindel suffered from low coverage. MoDIL missed lots of deletions.</p>
							</tfoot>
							<tbody valign="top">
								<row>
									<entry align="left" colname="c1">
										<p>BreakDancer</p>
									</entry>
									<entry align="center" colname="c2">
										<p>259/260</p>
									</entry>
									<entry align="center" colname="c3">
										<p>426/427</p>
									</entry>
									<entry align="center" colname="c4">
										<p>453/456</p>
									</entry>
									<entry align="center" colname="c5">
										<p>455/458</p>
									</entry>
									<entry align="center" colname="c6">
										<p>455/458</p>
									</entry>
								</row>
								<row>
									<entry align="left" colname="c1">
										<p>MoDIL</p>
									</entry>
									<entry align="center" colname="c2">
										<p>1/1</p>
									</entry>
									<entry align="center" colname="c3">
										<p>27/27</p>
									</entry>
									<entry align="center" colname="c4">
										<p>96/96</p>
									</entry>
									<entry align="center" colname="c5">
										<p>129/130</p>
									</entry>
									<entry align="center" colname="c6">
										<p>&#8211;/&#8211;</p>
									</entry>
								</row>
								<row>
									<entry align="left" colname="c1">
										<p>CLEVER</p>
									</entry>
									<entry align="center" colname="c2">
										<p>398/462</p>
									</entry>
									<entry align="center" colname="c3">
										<p>449/525</p>
									</entry>
									<entry align="center" colname="c4">
										<p>454/491</p>
									</entry>
									<entry align="center" colname="c5">
										<p>454/478</p>
									</entry>
									<entry align="center" colname="c6">
										<p>454/466</p>
									</entry>
								</row>
								<row>
									<entry align="left" colname="c1">
										<p>CNVnator</p>
									</entry>
									<entry align="center" colname="c2">
										<p>326/1,258</p>
									</entry>
									<entry align="center" colname="c3">
										<p>354/952</p>
									</entry>
									<entry align="center" colname="c4">
										<p>422/1,127</p>
									</entry>
									<entry align="center" colname="c5">
										<p>447/1,211</p>
									</entry>
									<entry align="center" colname="c6">
										<p>451/1,258</p>
									</entry>
								</row>
								<row rowsep="1">
									<entry align="left" colname="c1">
										<p>Pindel</p>
									</entry>
									<entry align="center" colname="c2">
										<p>85/85</p>
									</entry>
									<entry align="center" colname="c3">
										<p>317/317</p>
									</entry>
									<entry align="center" colname="c4">
										<p>436/438</p>
									</entry>
									<entry align="center" colname="c5">
										<p>450/454</p>
									</entry>
									<entry align="center" colname="c6">
										<p>456/456</p>
									</entry>
								</row>
							</tbody>
						</tgroup>
					</table><p>Next, we applied ChopSticks to the results of SV detection tools. After that, we examined how well the resolution of deletion calls was improved. We tested ChopSticks for <it>k</it>=1,2,&#8230;,5 and <it>f</it>=0.1,0.2,&#8230;,1.0. We evaluated differences at both the upstream and downstream ends of deletions, and found that the results were similar. Therefore we only present the results at upstream ends.</p>
					<sec>
						<st>
							<p>Resolution improvements for BreakDancer deletion calls</p>
						</st><p>As shown in Figure <figr fid="F6">6</figr>, the resolution of deletion calls was clearly improved by using ChopSticks. The original BreakDancer results was successfully corrected, which is also clear in Figure <figr fid="F7">7</figr>. When coverage was low, the resolution was well improved for small <it>k</it> values. When coverage was high, the resolution was also improved for large <it>k</it> values. Therefore, when the coverage is high, we recommend using large <it>k</it> values to ignore erroneous alignments. As shown in Figure <figr fid="F8">8</figr>, ChopSticks worked well regardless of deletion lengths.</p>
						<fig id="F6"><title><p>Figure 6</p></title><caption><p>BreakDancer results improved by ChopSticks</p></caption><text>
   <p><b>BreakDancer results improved by ChopSticks.</b> Box-and-whisker plots of upstream differences of deletion calls obtained by BreakDancer and those improved by ChopSticks. The red, green, blue, light blue, and magenta boxes correspond to <it>k</it> values of 1, 2, 3, 4, and 5, respectively, and the rightmost yellow box corresponds to the original results of BreakDancer. Among boxes of the same color, from left to right, <it>f</it>=0.1, 0.2, &#8230;, 1.0. Brown horizontal dashed lines indicate the values of 25%, 50%, and 75% tiles of differences of original deletion calls from below to above, respectively. The results in this figure indicate that ChopSticks clearly improved the resolution of the original BreakDancer results. When the coverage was low, small <it>k</it> values were effective in improving the resolution. When coverage was high, the resolution was also improved for large <it>k</it> values. Therefore, when the coverage is high, we recommend using large <it>k</it> values to avoid erroneous alignments of NGS reads and the genome. We omitted the results for coverage=15 because they were similar to those for coverage=20.</p>
</text><graphic file="1471-2105-13-279-6"/></fig>
						<fig id="F7"><title><p>Figure 7</p></title><caption><p>Distribution of differences of BreakDancer results and those improved by ChopSticks</p></caption><text>
   <p><b>Distribution of differences of BreakDancer results and those improved by ChopSticks.</b> The distribution of differences of ChopSticks results concentrated around zero, whereas that of BreakDancer results had long tail in 0&#8211;50 bp. Here, <it>k</it>=2, <it>f</it>=0.5, and coverage=5. Each frequency corresponds to the number of differences in bins of 2 bp.</p>
</text><graphic file="1471-2105-13-279-7"/></fig>
						<fig id="F8"><title><p>Figure 8</p></title><caption><p>Scatter plot of deletion lengths and differences of deletion calls</p></caption><text>
   <p><b>Scatter plot of deletion lengths and differences of deletion calls.</b> No correlation between deletion lengths and differences was observed (<it>r</it><sup>2</sup>=0.056). ChopSticks worked well regardless of deletion lengths. Here, <it>k</it>=2, <it>f</it>=0.5, and coverage=5.</p>
</text><graphic file="1471-2105-13-279-8"/></fig>
					</sec>
					<sec>
						<st>
							<p>Resolution improvements for MoDIL deletion calls</p>
						</st><p>As shown in Figure <figr fid="F9">9</figr>, the resolution of deletion calls by MoDIL was also improved by using ChopSticks. We omitted evaluation of MoDIL for coverage=20 because MoDIL was very slow (See Methods).</p>
						<fig id="F9"><title><p>Figure 9</p></title><caption><p>MoDIL results improved by ChopSticks</p></caption><text>
   <p><b>MoDIL results improved by ChopSticks.</b> Box-and-whisker plots of upstream differences of deletion calls obtained by MoDIL and those improved by ChopSticks. The format of this plot is exactly the same as that in Figure <figr fid="F6">6</figr>, except that results for coverage=15 were shown instead of those for coverage=20. The results in this figure indicate that ChopSticks can also improve the resolution of MoDIL results.</p>
</text><graphic file="1471-2105-13-279-9"/></fig>
					</sec>
					<sec>
						<st>
							<p>Resolution improvements for CLEVER deletion calls</p>
						</st><p>The resolution of deletion calls by CLEVER was also improved by using ChopSticks. As mentioned above, deletion calls of CLEVER do not always cover the whole of true deletions. Nonetheless, as shown in Figure <figr fid="F10">10</figr> and <figr fid="F11">11</figr>, ChopSticks successfully improved resolution of CLEVER results by selectively correcting predicted breakpoints outside true deletions.</p>
						<fig id="F10"><title><p>Figure 10</p></title><caption><p>CLEVER results improved by ChopSticks</p></caption><text>
   <p><b>CLEVER results improved by ChopSticks.</b> Box-and-whisker plots of upstream differences of deletion calls obtained by CLEVER and those improved by ChopSticks. The differences were successfully corrected. Note that a significant portion of breakpoints predicted by CLEVER were inside the true deletion. Nonetheless, ChopSticks selectively trimmed predicted breakpoints outside true deletions, and left those inside untouched.</p>
</text><graphic file="1471-2105-13-279-10"/></fig>
						<fig id="F11"><title><p>Figure 11</p></title><caption><p>Distribution of differences of CLEVER results and those improved by ChopSticks</p></caption><text>
   <p><b>Distribution of differences of CLEVER results and those improved by ChopSticks.</b> The distribution of differences of CLEVER results had long tail in 0&#8211;50 bp, whereas that improved by ChopSticks concentrates around zero. Here, <it>k</it>=2, <it>f</it>=0.5, and coverage=5. Each frequency corresponds to the number of displacements in bins of 2 bp.</p>
</text><graphic file="1471-2105-13-279-11"/></fig>
					</sec>
					<sec>
						<st>
							<p>Resolution improvements for CNVnator deletion calls</p>
						</st><p>Because RD methods call SVs by examining coverages in windows of a fixed size, the positions of breakpoints predicted by the RD methods have unavoidable ambiguity and they might be either inside or outside true deletions. Because ChopSticks assumes that predicted breakpoints are outside true deletions, we applied ChopSticks after we expanded deletion calls of CNVnator at both ends by the window size. As shown in Figure <figr fid="F12">12</figr>, the results of CNVnator were successfully improved. This result indicates that ChopSticks is also available for RD methods in addition to RP methods.</p>
						<fig id="F12"><title><p>Figure 12</p></title><caption><p>CNVnator results improved by ChopSticks</p></caption><text>
   <p><b>CNVnator results improved by ChopSticks.</b> Box-and-whisker plots of upstream differences of deletion calls obtained by CNVnator and those improved by ChopSticks. The format of this plot is exactly the same as that in Figure <figr fid="F6">6</figr>. We expanded the original deletion calls of CNVnator outward by the window size (50 bp) because ChopSticks assumes that predicted breakpoints are outside true deletions. The results in this figure indicate that ChopSticks can improve the resolution of CNVnator results if predicted positions of breakpoints are within a few hundreds of bases from true breakpoints.</p>
</text><graphic file="1471-2105-13-279-12"/></fig>
					</sec>
					<sec>
						<st>
							<p>Results of ChopSticks applied to Pindel deletion calls</p>
						</st><p>Owing to the SR signature that allows Pindel to detect SVs at bp-level resolution, the positions of breakpoints obtained with Pindel were quite accurate. When ChopSticks was applied to the results of Pindel, the results became slightly worse than the original Pindel results, as shown in Figure <figr fid="F13">13</figr>, although differences remained close to zero in most cases. Note that the recall of Pindel was not satisfactory when coverage is low, as shown in Figure <figr fid="F4">4</figr>. ChopSticks is useful in cases where deletions missed by Pindel are analyzed.</p>
						<fig id="F13"><title><p>Figure 13</p></title><caption><p>Pindel results and those modified by ChopSticks</p></caption><text>
   <p><b>Pindel results and those modified by ChopSticks.</b> Box-and-whisker plots of upstream differences of deletion calls obtained by Pindel and those modified by ChopSticks. The format of this plot is exactly the same as in Figure <figr fid="F6">6</figr>. The results in this figure indicate that ChopSticks should not be applied to the Pindel results because the resolution of the Pindel results is already quite high.</p>
</text><graphic file="1471-2105-13-279-13"/></fig>
					</sec>
				</sec>
				<sec>
					<st>
						<p>Real Illumina reads of DBA/2J</p>
					</st><p>In the second experiment, we evaluated ChopSticks using the real NGS sequences of Quinlan et al. <abbrgrp>
							<abbr bid="B7">7</abbr>
						</abbrgrp>. The sample was taken from a female mouse of the DBA/2J strain, whose genome contains SVs against the reference genome of the C57BL/6J strain <abbrgrp>
							<abbr bid="B21">21</abbr>
						</abbrgrp>. The read sequences were available from the NCBI Sequence Read Archive (SRA) database <abbrgrp>
							<abbr bid="B22">22</abbr>
						</abbrgrp>. The accession number of the read sequences is [SRA:SRA010027]. To evaluate the results of ChopSticks, we need bp-level SV annotations of DBA/2J as well. Therefore we generated deletion calls at bp-level resolution using Sanger reads in a manner similar to that of Quinlan et al. See Methods for details. Our deletion calls are available at the dbVar database under accession no. [dbVar:nstd70].</p><p>We tried the five SV detection tools used in the previous experiment, and found that MoDIL, CNVnator and Pindel missed the most of deletions detected with Sanger reads. These methods seemed to suffer from the low depth of coverage and short read lengths. Therefore, we hereafter only describe results of ChopSticks applied to BreakDancer and CLEVER results.</p>
					<sec>
						<st>
							<p>Resolution improvements for BreakDancer deletion calls</p>
						</st><p>Figure <figr fid="F14">14</figr> shows the differences between BreakDancer results and those improved by using ChopSticks. As the previous experiment where simulated NGS reads were used, the differences obtained with real NGS reads were reduced. The median and differences less than the median clearly shifted toward zero, which is also clear in Figure <figr fid="F15">15</figr>. Although ChopSticks trimmed some deletion calls into those based on Sanger reads when <it>k</it>=1 or <it>k</it>=2 and <it>f</it> was small, this problem quickly disappeared as <it>k</it> or <it>f</it> became larger. No correlation between deletion lengths and the performance of ChopSticks were observed (<it>r</it>
							<sup>2</sup>=0.021). Although we generated 525 deletion calls by using Sanger reads, only 83 of them were found by BreakDancer. There were at least two reasons for this difference in numbers. First, it is difficult to find small deletions because read pairs spanning small deletions might not be recognized as discordant pairs. Second, a lot of deletion calls based on Sanger reads had fewer than two NGS-read pairs spanning them. Such deletion calls would be missed because BreakDancer deletion calls must be supported by at least two pairs when the default parameters are used, in order to reduce false positives. For this data set, 82 of all 83 deletion calls generated by BreakDancer contained the whole of deletions predicted with Sanger reads.</p>
						<fig id="F14"><title><p>Figure 14</p></title><caption><p>BreakDancer results for DBA/2J reads improved by ChopSticks</p></caption><text>
   <p><b>BreakDancer results for DBA/2J reads improved by ChopSticks.</b> Box-and-whisker plots of upstream and downstream differences of deletion calls obtained by BreakDancer and those improved by ChopSticks. The results in this figure indicate that ChopSticks can improve the resolution of deletion calls for real sequences. Although ChopSticks trimmed upstream ends of a few deletion calls too much when <it>k</it>=1 or <it>k</it>=2 and <it>f</it> was small, such problems quickly disappeared for greater <it>k</it> and <it>f</it> values.</p>
</text><graphic file="1471-2105-13-279-14"/></fig>
						<fig id="F15"><title><p>Figure 15</p></title><caption><p>Distribution of differences of BreakDancer results and those improved by ChopSticks.</p></caption><text>
   <p><b>Distribution of differences of BreakDancer results and those improved by ChopSticks.</b> The distribution of differences of BreakDancer results had long tail in 0&#8211;400 bp, whereas that improved by ChopSticks concentrates around zero and frequencies in the long tail were reduced. Here, <it>k</it>=2, <it>f</it>=0.5. Each frequency corresponds to the number of differences in bins of 20 bp.</p>
</text><graphic file="1471-2105-13-279-15"/></fig>
					</sec>
					<sec>
						<st>
							<p>Resolution improvements for CLEVER deletion calls</p>
						</st><p>CLEVER detected much more (347) deletions than BreakDancer. The results of CLEVER were also improved by ChopSticks as shown in Figure <figr fid="F16">16</figr>, where the peak around zero became stronger. However, it was difficult for ChopSticks to correct positions of breakpoints when they were away from those predicted with Sanger reads by hundreds of bases.</p>
						<fig id="F16"><title><p>Figure 16</p></title><caption><p>Distribution of differences of CLEVER results and those improved by ChopSticks</p></caption><text>
   <p><b>Distribution of differences of CLEVER results and those improved by ChopSticks.</b> ChopSticks corrected some of breakpoints predicted by CLEVER so that the peak at zero became stronger. However, the distribution of differences of CLEVER results had long tail in 0&#8211;3000 bp and it was difficult for ChopSticks to correct such large differences. Here, <it>k</it>=2, <it>f</it>=0.5. Each frequency corresponds to the number of differences in bins of 20 bp.</p>
</text><graphic file="1471-2105-13-279-16"/></fig>
					</sec>
				</sec>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Conclusions</p>
			</st><p>We have presented a new method called ChopSticks to improve the resolution of predicted positions of deletions. The key idea is to exploit both concordant read pairs and discordant ones. According to our theoretical analysis, the resolution of our method is quite similar to that of threshold-based RP methods but with double coverage. In an experiment on simulated NGS reads, ChopSticks clearly improved the results of BreakDancer, MoDIL, CLEVER, and CNVnator. Although the resolution of Pindel results is quite high, ChopSticks works well even for low-coverage data where recall of Pindel is not sufficient. The effectiveness of ChopSticks was also confirmed by performing an experiment on real Illumina reads. Despite a number of methods proposed for detecting SVs <abbrgrp>
					<abbr bid="B2">2</abbr>
					<abbr bid="B3">3</abbr>
					<abbr bid="B4">4</abbr>
				</abbrgrp>, there is no one-stop method that simultaneously achieves high sensitivity, high specificity, high resolution, and robustness for low-coverage data. Therefore a combination of SV detection methods is required, and ChopSticks can play an important role because it uses new independent information ignored in other methods.</p><p>As a future work, we consider to develop a method to distinguish homozygous deletions from heterozygous ones and to apply ChopSticks to the former. With this approach, ChopSticks will be available for more applications.</p>
		</sec>
		<sec>
			<st>
				<p>Methods</p>
			</st>
			<sec>
				<st>
					<p>Derivation of theoretical estimation of resolution</p>
				</st><p>Because the resolution at downstream ends of deletions can be estimated symmetrically, we only analyze the resolution at upstream ends. Let <it>P</it>
					<sub>
						<it>b</it>
					</sub>be the probability that a breakpoint <it>b</it> is successfully included in a deletion call by a threshold-based RP method. If <it>b</it> is detected, there exists an upstream discordant read within <it>d</it> bases from <it>b</it>. Therefore, </p><p>
					<display-formula>
						<m:math name="1471-2105-13-279-i18" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:msub>
      <m:mrow>
         <m:mi>P</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>b</m:mi>
      </m:mrow>
   </m:msub>
   <m:mo>=</m:mo>
   <m:mn>1</m:mn>
   <m:mo>&#8722;</m:mo>
   <m:msup>
      <m:mrow>
         <m:mi>q</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>d</m:mi>
         <m:mo>+</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
   </m:msup>
   <m:mi>.</m:mi>
</m:mrow>
</m:math>
					</display-formula>
				</p><p>We derive the expected distance between the true ends of deletions and the predicted ones in a manner similar to Bashir&#8217;s analysis <abbrgrp>
						<abbr bid="B23">23</abbr>
					</abbrgrp>. For 0&#8804;<it>j</it>&#8804;<it>d</it>, Bashir et al. defined <it>A</it>
					<sub>
						<it>j</it>
					</sub>as an event in which <it>b</it> is detected and an upstream read of a discordant pair is exactly <it>j</it> bases upstream of <it>b</it>. The probability that <it>A</it>
					<sub>
						<it>j</it>
					</sub>occurs is </p><p>
					<display-formula>
						<m:math name="1471-2105-13-279-i19" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mtable columnalign="left">
   <m:mtr>
      <m:mtd>
         <m:mtext>Pr</m:mtext>
         <m:mo stretchy="false">(</m:mo>
         <m:msub>
            <m:mrow>
               <m:mi>A</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>j</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo stretchy="false">)</m:mo>
         <m:mo>=</m:mo>
         <m:mo stretchy="false">(</m:mo>
         <m:mn>1</m:mn>
         <m:mo>&#8722;</m:mo>
         <m:mi>q</m:mi>
         <m:mo stretchy="false">)</m:mo>
         <m:msup>
            <m:mrow>
               <m:mi>q</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>j</m:mi>
            </m:mrow>
         </m:msup>
         <m:mi>.</m:mi>
      </m:mtd>
   </m:mtr>
</m:mtable>
</m:math>
					</display-formula>
				</p><p>Consequently, </p><p>
					<display-formula>
						<m:math name="1471-2105-13-279-i20" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mtable columnalign="left">
   <m:mtr>
      <m:mtd columnalign="left">
         <m:mi>E</m:mi>
         <m:mo>[</m:mo>
         <m:msub>
            <m:mrow>
               <m:mi>&#916;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>b</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo>|</m:mo>
         <m:mi>b</m:mi>
         <m:mo>,</m:mo>
         <m:mi>c</m:mi>
         <m:mo>]</m:mo>
      </m:mtd>
   </m:mtr>
   <m:mtr>
      <m:mtd columnalign="left">
         <m:mo>=</m:mo>
         <m:mfrac>
            <m:mrow>
               <m:mn>1</m:mn>
            </m:mrow>
            <m:mrow>
               <m:msub>
                  <m:mrow>
                     <m:mi>P</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>b</m:mi>
                  </m:mrow>
               </m:msub>
            </m:mrow>
         </m:mfrac>
         <m:munder>
            <m:mrow>
               <m:mi mathsize="big">&#8721;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>0</m:mn>
               <m:mo>&#8804;</m:mo>
               <m:mi>j</m:mi>
               <m:mo>&#8804;</m:mo>
               <m:mi>d</m:mi>
            </m:mrow>
         </m:munder>
         <m:mi>j</m:mi>
         <m:mtext>Pr</m:mtext>
         <m:mo stretchy="false">(</m:mo>
         <m:msub>
            <m:mrow>
               <m:mi>A</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>j</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo stretchy="false">)</m:mo>
      </m:mtd>
   </m:mtr>
   <m:mtr>
      <m:mtd columnalign="left">
         <m:mo>=</m:mo>
         <m:mfrac>
            <m:mrow>
               <m:mn>1</m:mn>
               <m:mo>&#8722;</m:mo>
               <m:mi>q</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>1</m:mn>
               <m:mo>&#8722;</m:mo>
               <m:msup>
                  <m:mrow>
                     <m:mi>q</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>d</m:mi>
                     <m:mo>+</m:mo>
                     <m:mn>1</m:mn>
                  </m:mrow>
               </m:msup>
            </m:mrow>
         </m:mfrac>
         <m:mi>S</m:mi>
         <m:mo stretchy="false">(</m:mo>
         <m:mi>q</m:mi>
         <m:mo>,</m:mo>
         <m:mi>d</m:mi>
         <m:mo stretchy="false">)</m:mo>
         <m:mi>.</m:mi>
      </m:mtd>
   </m:mtr>
</m:mtable>
</m:math>
					</display-formula>
				</p><p>Similarly, we define <inline-formula>
						<m:math name="1471-2105-13-279-i21" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:msubsup>
   <m:mrow>
      <m:mi>A</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mi>j</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mo>&#8242;</m:mo>
   </m:mrow>
</m:msubsup>
</m:math>
					</inline-formula> as an event wherein <it>b</it> is detected and the closest read upstream of <it>b</it> is exactly <it>j</it> bases apart. There are two mutually exclusive cases: (i) at least one of the closest reads is an upstream discordant read or (ii) all the closest reads are concordant reads. In the latter case, we have to consider the joint probability of the following events. </p><p indent="1">&#8226; A concordant read exists at <it>j</it> bases upstream of <it>b</it>, the probability of which is 1&#8722;<it>q</it>.</p><p indent="1">&#8226; No read nearer than the closest concordant read exists, the probability of which is <it>q</it>
					<sup>2<it>j</it>
					</sup>.</p><p indent="1">&#8226; No discordant read exists at <it>j</it> bases upstream of <it>b</it>, the probability of which is <it>q</it>.</p><p indent="1">&#8226; There must exist an upstream read of discordant pairs whose alignment ends in a region that is <it>j</it> + 1 to <it>d</it> bases upstream of <it>b</it> so that <it>b</it> is successfully included in a deletion call, the probability of which is 1&#8722;<it>q</it>
					<sup>
						<it>d</it>&#8722;<it>j</it>
					</sup>.</p><p>Therefore, </p><p>
					<display-formula>
						<m:math name="1471-2105-13-279-i22" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mtable columnalign="left">
   <m:mtr>
      <m:mtd columnalign="left">
         <m:mtext>Pr</m:mtext>
         <m:mo stretchy="false">(</m:mo>
         <m:munderover accentunder="false" accent="false">
            <m:mrow>
               <m:mi>A</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>j</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mo>&#8242;</m:mo>
            </m:mrow>
         </m:munderover>
         <m:mo stretchy="false">)</m:mo>
         <m:mo>=</m:mo>
         <m:mo stretchy="false">(</m:mo>
         <m:mn>1</m:mn>
         <m:mo>&#8722;</m:mo>
         <m:mi>q</m:mi>
         <m:mo stretchy="false">)</m:mo>
         <m:msup>
            <m:mrow>
               <m:mi>q</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
               <m:mi>j</m:mi>
            </m:mrow>
         </m:msup>
         <m:mo>+</m:mo>
         <m:mo stretchy="false">(</m:mo>
         <m:mn>1</m:mn>
         <m:mo>&#8722;</m:mo>
         <m:mi>q</m:mi>
         <m:mo stretchy="false">)</m:mo>
         <m:msup>
            <m:mrow>
               <m:mi>q</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
               <m:mi>j</m:mi>
            </m:mrow>
         </m:msup>
         <m:mi>q</m:mi>
         <m:mo stretchy="false">(</m:mo>
         <m:mn>1</m:mn>
         <m:mo>&#8722;</m:mo>
         <m:msup>
            <m:mrow>
               <m:mi>q</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>d</m:mi>
               <m:mo>&#8722;</m:mo>
               <m:mi>j</m:mi>
            </m:mrow>
         </m:msup>
         <m:mo stretchy="false">)</m:mo>
      </m:mtd>
      <m:mtd>
         <m:mtext/>
      </m:mtd>
   </m:mtr>
   <m:mtr>
      <m:mtd columnalign="left">
         <m:mspace width="2.5em"/>
         <m:mo>=</m:mo>
         <m:mo stretchy="false">(</m:mo>
         <m:mn>1</m:mn>
         <m:mo>&#8722;</m:mo>
         <m:msup>
            <m:mrow>
               <m:mi>q</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msup>
         <m:mo stretchy="false">)</m:mo>
         <m:msup>
            <m:mrow>
               <m:mi>q</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
               <m:mi>j</m:mi>
            </m:mrow>
         </m:msup>
         <m:mo>&#8722;</m:mo>
         <m:msup>
            <m:mrow>
               <m:mi>q</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>d</m:mi>
               <m:mo>+</m:mo>
               <m:mn>1</m:mn>
            </m:mrow>
         </m:msup>
         <m:mo stretchy="false">(</m:mo>
         <m:mn>1</m:mn>
         <m:mo>&#8722;</m:mo>
         <m:mi>q</m:mi>
         <m:mo stretchy="false">)</m:mo>
         <m:msup>
            <m:mrow>
               <m:mi>q</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>j</m:mi>
            </m:mrow>
         </m:msup>
         <m:mi>.</m:mi>
      </m:mtd>
   </m:mtr>
</m:mtable>
</m:math>
					</display-formula>
				</p><p>Consequently, </p><p>
					<display-formula>
						<m:math name="1471-2105-13-279-i23" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mtable columnalign="left">
   <m:mtr>
      <m:mtd columnalign="left">
         <m:mi>E</m:mi>
         <m:mo>[</m:mo>
         <m:msubsup>
            <m:mrow>
               <m:mi>&#916;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>b</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mo>&#8242;</m:mo>
            </m:mrow>
         </m:msubsup>
         <m:mo>|</m:mo>
         <m:mi>b</m:mi>
         <m:mo>,</m:mo>
         <m:mi>c</m:mi>
         <m:mo>]</m:mo>
         <m:mo>=</m:mo>
         <m:mfrac>
            <m:mrow>
               <m:mn>1</m:mn>
            </m:mrow>
            <m:mrow>
               <m:msub>
                  <m:mrow>
                     <m:mi>P</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>b</m:mi>
                  </m:mrow>
               </m:msub>
            </m:mrow>
         </m:mfrac>
         <m:munder>
            <m:mrow>
               <m:mi mathsize="big">&#8721;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>0</m:mn>
               <m:mo>&#8804;</m:mo>
               <m:mi>j</m:mi>
               <m:mo>&#8804;</m:mo>
               <m:mi>d</m:mi>
            </m:mrow>
         </m:munder>
         <m:mi>j</m:mi>
         <m:mtext>Pr</m:mtext>
         <m:mo stretchy="false">(</m:mo>
         <m:munderover accentunder="false" accent="false">
            <m:mrow>
               <m:mi>A</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>j</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mo>&#8242;</m:mo>
            </m:mrow>
         </m:munderover>
         <m:mo stretchy="false">)</m:mo>
      </m:mtd>
   </m:mtr>
   <m:mtr>
      <m:mtd columnalign="left">
         <m:mspace width="5.5em"/>
         <m:mo>=</m:mo>
         <m:mfrac>
            <m:mrow>
               <m:mn>1</m:mn>
            </m:mrow>
            <m:mrow>
               <m:mn>1</m:mn>
               <m:mo>&#8722;</m:mo>
               <m:msup>
                  <m:mrow>
                     <m:mi>q</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>d</m:mi>
                     <m:mo>+</m:mo>
                     <m:mn>1</m:mn>
                  </m:mrow>
               </m:msup>
            </m:mrow>
         </m:mfrac>
         <m:mfenced separators="" open="(" close="">
            <m:mrow>
               <m:mspace width="-20.0pt"/>
               <m:mo stretchy="false">(</m:mo>
               <m:mn>1</m:mn>
               <m:mo>&#8722;</m:mo>
               <m:msup>
                  <m:mrow>
                     <m:mi>q</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mn>2</m:mn>
                  </m:mrow>
               </m:msup>
               <m:mo stretchy="false">)</m:mo>
               <m:mi>S</m:mi>
               <m:mo stretchy="false">(</m:mo>
               <m:msup>
                  <m:mrow>
                     <m:mi>q</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mn>2</m:mn>
                  </m:mrow>
               </m:msup>
               <m:mo>,</m:mo>
               <m:mi>d</m:mi>
               <m:mo stretchy="false">)</m:mo>
            </m:mrow>
         </m:mfenced>
      </m:mtd>
   </m:mtr>
   <m:mtr>
      <m:mtd columnalign="left">
         <m:mspace width="3em"/>
         <m:mfenced separators="" open="" close=")">
            <m:mrow>
               <m:mspace width="6em"/>
               <m:mo>&#8722;</m:mo>
               <m:msup>
                  <m:mrow>
                     <m:mi>q</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>d</m:mi>
                     <m:mo>+</m:mo>
                     <m:mn>1</m:mn>
                  </m:mrow>
               </m:msup>
               <m:mo stretchy="false">(</m:mo>
               <m:mn>1</m:mn>
               <m:mo>&#8722;</m:mo>
               <m:mi>q</m:mi>
               <m:mo stretchy="false">)</m:mo>
               <m:mi>S</m:mi>
               <m:mo stretchy="false">(</m:mo>
               <m:mi>q</m:mi>
               <m:mo>,</m:mo>
               <m:mi>d</m:mi>
               <m:mo stretchy="false">)</m:mo>
            </m:mrow>
         </m:mfenced>
         <m:mi>.</m:mi>
      </m:mtd>
   </m:mtr>
</m:mtable>
</m:math>
					</display-formula>
				</p>
				<sec>
					<st>
						<p>Proof of Theorem 1</p>
					</st><p>From Equation (1), <it>E</it>[<it>&#916;</it>
						<sub>
							<it>b</it>
						</sub>|<it>b</it>,2<it>c</it>] can be obtained by replacing <it>q</it>with <it>q</it>
						<sup>2</sup> in Equation (2): </p><p>
						<display-formula id="M5">
							<m:math name="1471-2105-13-279-i24" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mtable columnalign="left">
   <m:mtr>
      <m:mtd columnalign="left">
         <m:mi>E</m:mi>
         <m:mo>[</m:mo>
         <m:msub>
            <m:mrow>
               <m:mi>&#916;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>b</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo>|</m:mo>
         <m:mi>b</m:mi>
         <m:mo>,</m:mo>
         <m:mn>2</m:mn>
         <m:mi>c</m:mi>
         <m:mo>]</m:mo>
         <m:mo>=</m:mo>
         <m:mfrac>
            <m:mrow>
               <m:mn>1</m:mn>
               <m:mo>&#8722;</m:mo>
               <m:msup>
                  <m:mrow>
                     <m:mi>q</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mn>2</m:mn>
                  </m:mrow>
               </m:msup>
            </m:mrow>
            <m:mrow>
               <m:mn>1</m:mn>
               <m:mo>&#8722;</m:mo>
               <m:msup>
                  <m:mrow>
                     <m:mi>q</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mn>2</m:mn>
                     <m:mo stretchy="false">(</m:mo>
                     <m:mi>d</m:mi>
                     <m:mo>+</m:mo>
                     <m:mn>1</m:mn>
                     <m:mo stretchy="false">)</m:mo>
                  </m:mrow>
               </m:msup>
            </m:mrow>
         </m:mfrac>
         <m:mi>S</m:mi>
         <m:mo stretchy="false">(</m:mo>
         <m:msup>
            <m:mrow>
               <m:mi>q</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msup>
         <m:mo>,</m:mo>
         <m:mi>d</m:mi>
         <m:mo stretchy="false">)</m:mo>
         <m:mi>.</m:mi>
      </m:mtd>
   </m:mtr>
</m:mtable>
</m:math>
						</display-formula>
					</p><p>From Equations (2), (3), and (5), Equation (4) can be obtained.</p>
				</sec>
				<sec>
					<st>
						<p>Proof of Theorem 2</p>
					</st><p>First, we consider a case where <it>c</it>&#8594;0. Because <it>q</it>&#8594;1 by Equation (1), </p><p>
						<display-formula>
							<m:math name="1471-2105-13-279-i25" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>S</m:mi>
   <m:mo stretchy="false">(</m:mo>
   <m:mi>q</m:mi>
   <m:mo>,</m:mo>
   <m:mi>d</m:mi>
   <m:mo stretchy="false">)</m:mo>
   <m:mo>&#8594;</m:mo>
   <m:munderover accentunder="false" accent="false">
      <m:mrow>
         <m:mi mathsize="big">&#8721;</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>j</m:mi>
         <m:mo>=</m:mo>
         <m:mn>0</m:mn>
      </m:mrow>
      <m:mrow>
         <m:mi>d</m:mi>
      </m:mrow>
   </m:munderover>
   <m:mi>j</m:mi>
   <m:mo>=</m:mo>
   <m:mfrac>
      <m:mrow>
         <m:mi>d</m:mi>
         <m:mo stretchy="false">(</m:mo>
         <m:mi>d</m:mi>
         <m:mo>+</m:mo>
         <m:mn>1</m:mn>
         <m:mo stretchy="false">)</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mn>2</m:mn>
      </m:mrow>
   </m:mfrac>
   <m:mi>.</m:mi>
</m:mrow>
</m:math>
						</display-formula>
					</p><p>Besides, </p><p>
						<display-formula>
							<m:math name="1471-2105-13-279-i26" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mfrac>
      <m:mrow>
         <m:mn>1</m:mn>
         <m:mo>&#8722;</m:mo>
         <m:mi>q</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mn>1</m:mn>
         <m:mo>&#8722;</m:mo>
         <m:msup>
            <m:mrow>
               <m:mi>q</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>d</m:mi>
               <m:mo>+</m:mo>
               <m:mn>1</m:mn>
            </m:mrow>
         </m:msup>
      </m:mrow>
   </m:mfrac>
   <m:mo>=</m:mo>
   <m:mfrac>
      <m:mrow>
         <m:mn>1</m:mn>
      </m:mrow>
      <m:mrow>
         <m:mn>1</m:mn>
         <m:mo>+</m:mo>
         <m:mi>q</m:mi>
         <m:mo>+</m:mo>
         <m:msup>
            <m:mrow>
               <m:mi>q</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msup>
         <m:mo>+</m:mo>
         <m:mo>&#8943;</m:mo>
         <m:mo>+</m:mo>
         <m:msup>
            <m:mrow>
               <m:mi>q</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>d</m:mi>
            </m:mrow>
         </m:msup>
      </m:mrow>
   </m:mfrac>
   <m:mo>&#8594;</m:mo>
   <m:mfrac>
      <m:mrow>
         <m:mn>1</m:mn>
      </m:mrow>
      <m:mrow>
         <m:mi>d</m:mi>
         <m:mo>+</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
   </m:mfrac>
   <m:mi>.</m:mi>
</m:mrow>
</m:math>
						</display-formula>
					</p><p>Therefore, all of <inline-formula>
							<m:math name="1471-2105-13-279-i27" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mi>E</m:mi>
<m:mo>[</m:mo>
<m:msubsup>
   <m:mrow>
      <m:mi>&#916;</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mi>b</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mo>&#8242;</m:mo>
   </m:mrow>
</m:msubsup>
<m:mo>|</m:mo>
<m:mi>b</m:mi>
<m:mo>,</m:mo>
<m:mi>c</m:mi>
<m:mo>]</m:mo>
</m:math>
						</inline-formula>, <it>E</it>[<it>&#916;</it>
						<sub>
							<it>b</it>
						</sub>|<it>b</it>,2<it>c</it>], and <it>E</it>[<it>&#916;</it>
						<sub>
							<it>b</it>
						</sub>|<it>b</it>,<it>c</it>] approach <it>d</it>/2 by Equation (4). On the other hand, when <it>c</it>&#8594;<it>&#8734;</it>, <it>q</it>
						<sup>
							<it>d</it> + 1</sup>approaches 0. In consequence, the right hand side of Equation (4) approaches <it>E</it>[<it>&#916;</it>
						<sub>
							<it>b</it>
						</sub>|<it>b</it>,2<it>c</it>] when <it>c</it>&#8594;0 or <it>c</it>&#8594;<it>&#8734;</it>.</p>
				</sec>
			</sec>
			<sec>
				<st>
					<p>Mapping to the genome</p>
				</st><p>We mapped paired reads to the mm9 reference genome sequences of <it>Mus musculus</it> using BWA version 0.5.9 <abbrgrp>
						<abbr bid="B20">20</abbr>
					</abbrgrp> with default parameters. The target genome sequences involved in our experiment included all chromosomes of mm9 except chromosome Y, assuming cases where a female mouse was analyzed <abbrgrp>
						<abbr bid="B7">7</abbr>
						<abbr bid="B21">21</abbr>
					</abbrgrp>.</p>
				<sec>
					<st>
						<p>Simulated NGS sequences</p>
					</st><p>To focus on uniquely mapped reads for BreakDancer, MoDIL, CLEVER, and ChopSticks, we removed paired reads if the mapping quality (MAPQ) score was zero for at least one of the two reads of a pair. For CNVnator and Pindel, we used the result of BWA without filtering. We show the total length of reads and the number of aligned reads in Table <tblr tid="T2">2</tblr>.</p>
					<table id="T2">
						<title>
							<p>Table 2</p>
						</title>
						<caption>
							<p>
								<b>Number of bases and number of reads of simulated data set</b>
							</p>
						</caption>
						<tgroup cols="6">
							<colspec align="left" colname="c1" colnum="1" colwidth="16*"/>
							<colspec align="center" colname="c2" colnum="2" colwidth="16*"/>
							<colspec align="center" colname="c3" colnum="3" colwidth="16*"/>
							<colspec align="center" colname="c4" colnum="4" colwidth="16*"/>
							<colspec align="center" colname="c5" colnum="5" colwidth="16*"/>
							<colspec align="center" colname="c6" colnum="6" colwidth="16*"/>
							<thead valign="top">
								<row>
									<entry align="left" colname="c1"/>
									<entry align="center" colname="c2" nameend="c5" namest="c2" rowsep="1">
										<p>
											<b>Depth of coverage</b>
										</p>
									</entry>
								</row>
								<row rowsep="1">
									<entry align="left" colname="c1"/>
									<entry align="left" colname="c2">
										<p>
											<b>2</b>
										</p>
									</entry>
									<entry align="left" colname="c3">
										<p>
											<b>5</b>
										</p>
									</entry>
									<entry align="left" colname="c4">
										<p>
											<b>10</b>
										</p>
									</entry>
									<entry align="left" colname="c5">
										<p>
											<b>15</b>
										</p>
									</entry>
									<entry align="left" colname="c6">
										<p>
											<b>20</b>
										</p>
									</entry>
								</row>
							</thead>
							<tfoot>
								<p>Summarized statistics of simulated NGS reads and their alignments to mm9. On the third row, we only counted read pairs whose reads were both mapped uniquely.</p>
							</tfoot>
							<tbody valign="top">
								<row>
									<entry align="left" colname="c1">
										<p>Total number of bases</p>
									</entry>
									<entry colname="c2">
										<p>394,391,200</p>
									</entry>
									<entry colname="c3">
										<p>985,978,000</p>
									</entry>
									<entry colname="c4">
										<p>1,971,956,000</p>
									</entry>
									<entry colname="c5">
										<p>2,957,934,000</p>
									</entry>
									<entry colname="c6">
										<p>3,943,912,000</p>
									</entry>
								</row>
								<row>
									<entry align="left" colname="c1">
										<p>Number of reads</p>
									</entry>
									<entry colname="c2">
										<p>3,943,912</p>
									</entry>
									<entry colname="c3">
										<p>9,859,780</p>
									</entry>
									<entry colname="c4">
										<p>19,719,560</p>
									</entry>
									<entry colname="c5">
										<p>29,579,340</p>
									</entry>
									<entry colname="c6">
										<p>39,439,120</p>
									</entry>
								</row>
								<row rowsep="1">
									<entry align="left" colname="c1">
										<p>Number of mapped reads</p>
									</entry>
									<entry colname="c2">
										<p>3,677,398</p>
									</entry>
									<entry colname="c3">
										<p>9,194,942</p>
									</entry>
									<entry colname="c4">
										<p>18,391,288</p>
									</entry>
									<entry colname="c5">
										<p>27,587,970</p>
									</entry>
									<entry colname="c6">
										<p>36,783,348</p>
									</entry>
								</row>
							</tbody>
						</tgroup>
					</table>
				</sec>
				<sec>
					<st>
						<p>Real DBA/2J sequences</p>
					</st><p>We split the data set of NGS reads into 275 subsets, and mapped each of them with an independent BWA process and merged the results. Then we removed reads whose MAPQ score was zero for at least one of the two reads of a pair. We show the total length of reads and the number of aligned reads in Table <tblr tid="T3">3</tblr>.</p>
					<table id="T3">
						<title>
							<p>Table 3</p>
						</title>
						<caption>
							<p>
								<b>Number of bases and number of reads of DBA/2J data set</b>
							</p>
						</caption>
						<tgroup cols="2">
							<colspec align="left" colname="c1" colnum="1" colwidth="50*"/>
							<colspec align="center" colname="c2" colnum="2" colwidth="50*"/>
							<tfoot>
								<p>Summarized statistics of NGS reads of the DBA/2J strain <abbrgrp>
										<abbr bid="B7">7</abbr>
									</abbrgrp> and their alignments to mm9.</p>
							</tfoot>
							<tbody valign="top">
								<row>
									<entry align="left" colname="c1">
										<p>Total number of bases</p>
									</entry>
									<entry align="left" colname="c2">
										<p>13,050,980,662</p>
									</entry>
								</row>
								<row>
									<entry align="left" colname="c1">
										<p>Number of reads</p>
									</entry>
									<entry align="left" colname="c2">
										<p>330,462,408</p>
									</entry>
								</row>
								<row>
									<entry align="left" colname="c1">
										<p>Reads of uniquely mapped pairs</p>
									</entry>
									<entry align="left" colname="c2">
										<p>149,021,716</p>
									</entry>
								</row>
								<row rowsep="1">
									<entry align="left" colname="c1">
										<p>Reads of uniquely mapped pairs (chromosome 1)</p>
									</entry>
									<entry align="left" colname="c2">
										<p>10,316,525</p>
									</entry>
								</row>
							</tbody>
						</tgroup>
					</table>
				</sec>
			</sec>
			<sec>
				<st>
					<p>Trimming algorithm of ChopSticks</p>
				</st><p>The coverage outside a deletion should be higher than that inside it. Therefore ChopSticks repeatedly recognizes a high-coverage region in a deletion call that is likely a continuation of a high-coverage region outside the deletion. We show in Figure <figr fid="F17">17</figr> the trimming algorithm executed by ChopSticks for upstream ends. Here is a brief description of the algorithm: <b>Line 2:</b> Skip a high-coverage region at the end of the deletion call. <b>Lines 6&#8211;9:</b> Go through a low-coverage region. <b>Lines 10&#8211;13:</b> Go through a high-coverage region. <b>Line 14:</b> If the joint coverage is low, exit the loop. <b>Line 17:</b> Trim regions which the algorithm has gone through.Trimming of the downstream ends is conducted symmetrically.</p>
				<fig id="F17"><title><p>Figure 17</p></title><caption><p>Pseudocode of trimming algorithm</p></caption><text>
   <p><b>Pseudocode of trimming algorithm.</b> Pseudocode of the trimming algorithm of ChopSticks. Here, <it>L</it> is the length of the deletion call being processed, <it>k</it> is a threshold used to discriminate high-coverage regions from low-coverage ones, and <it>f</it> is a parameter that determines the threshold of the coverage of regions to be trimmed. The variable <it>x</it> represents the position of the base being examined, and the variable <it>y</it> represents the length of a region to be trimmed. The value <it>c</it>[<it>x</it>] is the coverage at the <it>x</it>-th base in the deletion call, while <it>s</it> keeps the sum of <it>c</it>[<it>x</it>] values.</p>
</text><graphic file="1471-2105-13-279-17"/></fig>
			</sec>
			<sec>
				<st>
					<p>Data for computational experiments</p>
				</st><p>To evaluate our method, we need NGS sequences and reliable bp-level positions of breakpoints. There were six SV studies of inbred mice (nstd5, 7, 15, 18, 19, and 48) in the dbVar database <abbrgrp>
						<abbr bid="B22">22</abbr>
					</abbrgrp> when we accessed it on April 1, 2012. However, none of them provides accurate bp-level positions of breakpoints. Therefore, we evaluated ChopSticks using the following two data sets.</p>
				<sec>
					<st>
						<p>Simulated NGS reads</p>
					</st><p>We artificially introduced deletions and insertions into the mm9 reference genome and then generated simulated NGS reads using the modified genome. To obtain most realistic simulated sequences, we built a simulated genome sequence using SV annotations generated by Quinlan et al. <abbrgrp>
							<abbr bid="B7">7</abbr>
						</abbrgrp>, which are available from the dbVar database under accession no. [dbVar:nstd19]. First, we deleted regions annotated as deletions in [dbVar:nstd19] from the mm9 reference genome sequence of chromosome 1. We show the distribution of lengths of deletions in Figure <figr fid="F18">18</figr>. Second, we inserted fragments consisting of randomly chosen bases so that the number and the distribution of lengths of inserted fragments were the same as those of deletions, assuming that the genome to be analyzed and the reference genome are affected symmetrically by deletions and insertions. Third, we introduced random single nucleotide substitutions with a probability of 1.0&#215;10<sup>&#8722;4</sup> at each base. Finally, we generated paired reads from the modified genome sequence so that the read length was 100 bp and the average and the standard deviation of distances of paired reads were 200 bp and 50 bp, respectively. We generated five sets of simulated NGS reads whose depth of coverage were 2, 5, 10, 15, and 20, respectively.</p>
					<fig id="F18"><title><p>Figure 18</p></title><caption><p>Distribution of deletion lengths in our simulation</p></caption><text>
   <p>
      <b>Distribution of deletion lengths in our simulation.</b>
   </p>
</text><graphic file="1471-2105-13-279-18"/></fig>
				</sec>
				<sec>
					<st>
						<p>NGS reads of Quinlan et al. and deletion calls based on Sanger reads</p>
					</st><p>We generated our own bp-level deletion calls by using publicly available Sanger reads of the DBA/2J strain. From the NCBI trace archive, we retrieved all 7,998,826 Sanger reads of whole-genome shotgun sequencing for the DBA/2J strain. We mapped these Sanger reads to chromosome 1 of mm9 by MegaBLAST <abbrgrp>
							<abbr bid="B24">24</abbr>
						</abbrgrp>, and we searched for Sanger reads that were split into two parts and aligned uniquely on the same strand and in the right order. There were 763 reads that indicated deletions whose lengths were at least 50 bp. By merging redundant ones, we obtained 525 deletion calls. These deletion calls are available in the dbVar database under accession no. [dbVar:nstd70]. We show the distribution of their lengths in Figure <figr fid="F19">19</figr>. NGS sequences of the DBA/2J strain generated by Quinlan et al. are available in the SRA database <abbrgrp>
							<abbr bid="B22">22</abbr>
						</abbrgrp> under accession no. [SRA:SRA010027].</p>
					<fig id="F19"><title><p>Figure 19</p></title><caption><p>Distribution of deletion lengths detected with Sanger reads</p></caption><text>
   <p>
      <b>Distribution of deletion lengths detected with Sanger reads.</b>
   </p>
</text><graphic file="1471-2105-13-279-19"/></fig>
				</sec>
			</sec>
			<sec>
				<st>
					<p>Parameters for SV detection tools and evaluation of their results</p>
				</st><p>We executed BreakDancer with default parameters, and Pindel with an expected template size of 432 bp because the median fragment size was 432 bp according to Quinlan et al. <abbrgrp>
						<abbr bid="B7">7</abbr>
					</abbrgrp>. For CNVnator, we tested three window sizes: 50 bp, 100 bp, and 200 bp. Because the recall of window size 50 bp outperformed those of window sizes 100 bp and 200 bp for our simulated data when coverage was 2, we used results of window size 50 bp for evaluation. Because CLEVER tends to generate deletion calls duplicatedly with slightly different positions, we chose the best one for those overlapping with true deletions in order to estimate the upper limit of the accuracy of CLEVER. We divided the chromosome 1 of mm9 into 5.1 Mbp fragments in a manner such that franking fragments share 0.1Mbp, and applied MoDIL to each fragments, because MoDIL was quite slow as reported previously <abbrgrp>
						<abbr bid="B9">9</abbr>
					</abbrgrp>. We omitted evaluation of MoDIL for coverage=20.</p><p>To compare the positions of true and predicted deletions, we used BEDTools <abbrgrp>
						<abbr bid="B25">25</abbr>
					</abbrgrp>.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Competing interests</p>
			</st><p>The authors declare that they have no competing interests.</p>
		</sec>
		<sec>
			<st>
				<p>Authors&#8217; contributions</p>
			</st><p>TY conceived the project, invented and implemented the algorithms, and performed the computational analysis. SS assisted TY in conducting experiments. MN and SM critically revised the manuscript. All authors read and approved the final manuscript</p>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st><p>The super-computing resource was provided by Human Genome Center (University of Tokyo).</p>
			</sec>
		</ack>
		<refgrp><bibl id="B1"><title><p>Illumina Sequencing portfolio</p></title><note>[<url>http://www.illumina.com/systems/sequencing.ilmn</url>]</note></bibl><bibl id="B2"><title><p>A map of human genome variation from population-scale sequencing</p></title><aug><au><cnm>The 1000 genomes project consortium</cnm></au></aug><source>Nature</source><pubdate>2010</pubdate><volume>467</volume><fpage>1061</fpage><lpage>1073</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature09534</pubid><pubid idtype="pmcid">3042601</pubid><pubid idtype="pmpid" link="fulltext">20981092</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>1000 genomes project: Mapping copy number variation by population-scale genome sequencing</p></title><aug><au><snm>Mills</snm><fnm>RE</fnm></au><au><snm>Walter</snm><fnm>K</fnm></au><au><snm>Stewart</snm><fnm>C</fnm></au><au><snm>Handsaker</snm><fnm>RE</fnm></au><au><snm>Chen</snm><fnm>K</fnm></au><au><snm>Alkan</snm><fnm>C</fnm></au><au><snm>Abyzov</snm><fnm>A</fnm></au><au><snm>Yoon</snm><fnm>SC</fnm></au><au><snm>Ye</snm><fnm>K</fnm></au><au><snm>Cheetham</snm><fnm>RK</fnm></au><au><snm>Chinwalla</snm><fnm>A</fnm></au><au><snm>Conrad</snm><fnm>DF</fnm></au><au><snm>Fu</snm><fnm>Y</fnm></au><au><snm>Grubert</snm><fnm>F</fnm></au><au><snm>Hajirasouliha</snm><fnm>I</fnm></au><au><snm>Hormozdiari</snm><fnm>F</fnm></au><au><snm>Iakoucheva</snm><fnm>LM</fnm></au><au><snm>Iqbal</snm><fnm>Z</fnm></au><au><snm>Kang</snm><fnm>S</fnm></au><au><snm>Kidd</snm><fnm>JM</fnm></au><au><snm>Konkel</snm><fnm>MK</fnm></au><au><snm>Korn</snm><fnm>J</fnm></au><au><snm>Khurana</snm><fnm>E</fnm></au><au><snm>Kural</snm><fnm>D</fnm></au><au><snm>Lam</snm><fnm>HYK</fnm></au><au><snm>Leng</snm><fnm>J</fnm></au><au><snm>Li</snm><fnm>R</fnm></au><au><snm>Li</snm><fnm>Y</fnm></au><au><snm>Lin</snm><fnm>CY</fnm></au><au><snm>Luo</snm><fnm>R</fnm></au><etal/></aug><source>Nature</source><pubdate>2011</pubdate><volume>470</volume><fpage>59</fpage><lpage>65</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature09708</pubid><pubid idtype="pmcid">3077050</pubid><pubid idtype="pmpid" link="fulltext">21293372</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>Computational methods for discovering structural variation with next-generation sequencing</p></title><aug><au><snm>Medvedev</snm><fnm>P</fnm></au><au><snm>Stanciu</snm><fnm>M</fnm></au><au><snm>Brudno</snm><fnm>M</fnm></au></aug><source>Nat Methods</source><pubdate>2009</pubdate><volume>6</volume><fpage>S13&#8212;S20</fpage><xrefbib><pubid idtype="pmpid" link="fulltext">19844226</pubid></xrefbib></bibl><bibl id="B5"><title><p>BreakDancer: an algorithm for high-resolution mapping of genomic structural variation</p></title><aug><au><snm>Chen</snm><fnm>K</fnm></au><au><snm>Wallis</snm><fnm>JW</fnm></au><au><snm>McLellan</snm><fnm>MD</fnm></au><au><snm>Larson</snm><fnm>DE</fnm></au><au><snm>Kalicki</snm><fnm>JM</fnm></au><au><snm>Pohl</snm><fnm>CS</fnm></au><au><snm>McGrath</snm><fnm>SD</fnm></au><au><snm>Wendl</snm><fnm>MC</fnm></au><au><snm>Zhang</snm><fnm>Q</fnm></au><au><snm>Locke</snm><fnm>DP</fnm></au><au><snm>Shi</snm><fnm>X</fnm></au><au><snm>Fulton</snm><fnm>RS</fnm></au><au><snm>Ley</snm><fnm>TJ</fnm></au><au><snm>Wilson</snm><fnm>RK</fnm></au><au><snm>Ding</snm><fnm>L</fnm></au><au><snm>Mardis</snm><fnm>ER</fnm></au></aug><source>Nat Methods</source><pubdate>2009</pubdate><volume>6</volume><fpage>677</fpage><lpage>681</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nmeth.1363</pubid><pubid idtype="pmpid" link="fulltext">19668202</pubid></pubidlist></xrefbib></bibl><bibl id="B6"><title><p>Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes</p></title><aug><au><snm>Hormozdiari</snm><fnm>F</fnm></au><au><snm>Alkan</snm><fnm>C</fnm></au><au><snm>Eichler</snm><fnm>EE</fnm></au><au><snm>Sahinalp</snm><fnm>SC</fnm></au></aug><source>Genome Res</source><pubdate>2009</pubdate><volume>19</volume><fpage>1270</fpage><lpage>1278</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.088633.108</pubid><pubid idtype="pmcid">2704429</pubid><pubid idtype="pmpid" link="fulltext">19447966</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>Genome-wide mapping and assembly of structural variant breakpoints in the mouse genome</p></title><aug><au><snm>Quinlan</snm><fnm>AR</fnm></au><au><snm>Clark</snm><fnm>RA</fnm></au><au><snm>Sokolova</snm><fnm>S</fnm></au><au><snm>Leibowitz</snm><fnm>ML</fnm></au><au><snm>Zhang</snm><fnm>Y</fnm></au><au><snm>Hurles</snm><fnm>ME</fnm></au><au><snm>Mell</snm><fnm>JC</fnm></au><au><snm>Hall</snm><fnm>IM</fnm></au></aug><source>Genome Res</source><pubdate>2010</pubdate><volume>20</volume><fpage>623</fpage><lpage>635</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.102970.109</pubid><pubid idtype="pmcid">2860164</pubid><pubid idtype="pmpid" link="fulltext">20308636</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>MoDIL: detecting small indels from clone-end sequencing with mixtures of distributions</p></title><aug><au><snm>Lee</snm><fnm>S</fnm></au><au><snm>Hormozdiari</snm><fnm>F</fnm></au><au><snm>Alkan</snm><fnm>C</fnm></au><au><snm>Brudno</snm><fnm>M</fnm></au></aug><source>Nat Methods</source><pubdate>2009</pubdate><volume>6</volume><fpage>473</fpage><lpage>474</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nmeth.f.256</pubid><pubid idtype="pmpid" link="fulltext">19483690</pubid></pubidlist></xrefbib></bibl><bibl id="B9"><title><p>CLEVER: Clique-Enumerating Variant Finder</p></title><aug><au><snm>Marschall</snm><fnm>T</fnm></au><au><snm>Costa</snm><fnm>I</fnm></au><au><snm>Canzar</snm><fnm>S</fnm></au><au><snm>Bauer</snm><fnm>M</fnm></au><au><snm>Klau</snm><fnm>G</fnm></au><au><snm>Schliep</snm><fnm>A</fnm></au><au><cnm>Sch&#246;nhuth A</cnm></au></aug><source>Bioinformatics</source><pubdate>2012,</pubdate><volume>28</volume><fpage>2875</fpage><lpage>2882</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/bts566</pubid><pubid idtype="pmpid" link="fulltext">23060616</pubid></pubidlist></xrefbib></bibl><bibl id="B10"><title><p>Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing</p></title><aug><au><snm>Campbell</snm><fnm>PJ</fnm></au><au><snm>Stephens</snm><fnm>PJ</fnm></au><au><snm>Pleasance</snm><fnm>ED</fnm></au><au><snm>O&#8217;Meara</snm><fnm>S</fnm></au><au><snm>Li</snm><fnm>H</fnm></au><au><snm>Santarius</snm><fnm>T</fnm></au><au><snm>Stebbings</snm><fnm>LA</fnm></au><au><snm>Leroy</snm><fnm>C</fnm></au><au><snm>Edkins</snm><fnm>S</fnm></au><au><snm>Hardy</snm><fnm>C</fnm></au><au><snm>Teague</snm><fnm>JW</fnm></au><au><snm>Menzies</snm><fnm>A</fnm></au><au><snm>Goodhead</snm><fnm>I</fnm></au><au><snm>Turner</snm><fnm>DJ</fnm></au><au><snm>Clee</snm><fnm>CM</fnm></au><au><snm>Quail</snm><fnm>MA</fnm></au><au><snm>Cox</snm><fnm>A</fnm></au><au><snm>Brown</snm><fnm>C</fnm></au><au><snm>Durbin</snm><fnm>R</fnm></au><au><snm>Hurles</snm><fnm>ME</fnm></au><au><snm>Bignell</snm><fnm>PAWEGR</fnm></au><au><snm>andP Andrew Futreal</snm><fnm>MRS</fnm></au></aug><source>Nat Genet</source><pubdate>2008</pubdate><volume>40</volume><fpage>722</fpage><lpage>729</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng.128</pubid><pubid idtype="pmcid">2705838</pubid><pubid idtype="pmpid" link="fulltext">18438408</pubid></pubidlist></xrefbib></bibl><bibl id="B11"><title><p>CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing</p></title><aug><au><snm>Abyzov</snm><fnm>A</fnm></au><au><snm>Urban</snm><fnm>AE</fnm></au><au><snm>Snyder</snm><fnm>M</fnm></au><au><snm>Gerstein</snm><fnm>M</fnm></au></aug><source>Genome Res</source><pubdate>2011</pubdate><volume>21</volume><fpage>974</fpage><lpage>984</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.114876.110</pubid><pubid idtype="pmcid">3106330</pubid><pubid idtype="pmpid" link="fulltext">21324876</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads</p></title><aug><au><snm>Ye</snm><fnm>K</fnm></au><au><snm>Schulz</snm><fnm>MH</fnm></au><au><snm>Long</snm><fnm>Q</fnm></au><au><snm>Apweiler</snm><fnm>R</fnm></au><au><snm>Ning</snm><fnm>Z</fnm></au></aug><source>Bioinformatics</source><pubdate>2009</pubdate><volume>25</volume><issue>21</issue><fpage>2865</fpage><lpage>2871</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btp394</pubid><pubid idtype="pmcid">2781750</pubid><pubid idtype="pmpid" link="fulltext">19561018</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>De novo assembly of human genomes with massively parallel short read sequencing</p></title><aug><au><snm>Li</snm><fnm>R</fnm></au><au><snm>Zhu</snm><fnm>H</fnm></au><au><snm>Ruan</snm><fnm>J</fnm></au><au><snm>Qian</snm><fnm>W</fnm></au><au><snm>Fang</snm><fnm>X</fnm></au><au><snm>Shi</snm><fnm>Z</fnm></au><au><snm>Li</snm><fnm>Y</fnm></au><au><snm>Li</snm><fnm>S</fnm></au><au><snm>Shan</snm><fnm>G</fnm></au><au><snm>Kristiansen</snm><fnm>K</fnm></au><au><snm>Li</snm><fnm>S</fnm></au><au><snm>Yang</snm><fnm>H</fnm></au><au><snm>Wang</snm><fnm>J</fnm></au><au><snm>Wang</snm><fnm>J</fnm></au></aug><source>Genome Res</source><pubdate>2010</pubdate><volume>20</volume><fpage>265</fpage><lpage>272</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.097261.109</pubid><pubid idtype="pmcid">2813482</pubid><pubid idtype="pmpid" link="fulltext">20019144</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>Target-enrichment strategies for next-generation sequencing</p></title><aug><au><snm>Mamanova</snm><fnm>L</fnm></au><au><snm>Coffey</snm><fnm>AJ</fnm></au><au><snm>Scott</snm><fnm>CE</fnm></au><au><snm>Kozarewa</snm><fnm>I</fnm></au><au><snm>Turner</snm><fnm>EH</fnm></au><au><snm>Kumar</snm><fnm>A</fnm></au><au><snm>Howard</snm><fnm>E</fnm></au><au><snm>Shendure</snm><fnm>J</fnm></au><au><snm>Turner</snm><fnm>DJ</fnm></au></aug><source>Nat Methods</source><pubdate>2010</pubdate><volume>7</volume><fpage>111</fpage><lpage>118</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nmeth.1419</pubid><pubid idtype="pmpid" link="fulltext">20111037</pubid></pubidlist></xrefbib></bibl><bibl id="B15"><title><p>Discovery and genotyping of genome structural polymorphism by sequencing on a population scale</p></title><aug><au><snm>Handsaker</snm><fnm>R</fnm></au><au><snm>Korn</snm><fnm>J</fnm></au><au><snm>Nemesh</snm><fnm>J</fnm></au><au><snm>McCarroll</snm><fnm>S</fnm></au></aug><source>Nat Genet</source><pubdate>2011</pubdate><volume>43</volume><issue>3</issue><fpage>269</fpage><lpage>276</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng.768</pubid><pubid idtype="pmpid" link="fulltext">21317889</pubid></pubidlist></xrefbib></bibl><bibl id="B16"><title><p>SVseq: an approach for detecting exact breakpoints of deletions with low-coverage sequence data</p></title><aug><au><snm>Zhang</snm><fnm>J</fnm></au><au><snm>Wu</snm><fnm>Y</fnm></au></aug><source>Bioinformatics</source><pubdate>2011</pubdate><volume>27</volume><issue>23</issue><fpage>3228</fpage><lpage>3234</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btr563</pubid><pubid idtype="pmpid" link="fulltext">21994222</pubid></pubidlist></xrefbib></bibl><bibl id="B17"><title><p>Inbred mice - The Jaxon Laboratory</p></title><note>[<url>http://jaxmice.jax.org/type/inbred/index.html</url>]</note></bibl><bibl id="B18"><title><p>Genomic mapping by fingerprinting random clones: a mathematical analysis</p></title><aug><au><snm>Lander</snm><fnm>ES</fnm></au><au><snm>Waterman</snm><fnm>MS</fnm></au></aug><source>Genomics</source><pubdate>1988</pubdate><volume>2</volume><issue>3</issue><fpage>231</fpage><lpage>239</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/0888-7543(88)90007-9</pubid><pubid idtype="pmpid" link="fulltext">3294162</pubid></pubidlist></xrefbib></bibl><bibl id="B19"><aug><au><cnm>ChopSticks</cnm></au></aug><note>[<url>https://github.com/toyasuda/ChopSticks</url>]</note></bibl><bibl id="B20"><title><p>Fast and accurate short read alignment with Burrows-Wheeler transform</p></title><aug><au><snm>Li</snm><fnm>R</fnm></au><au><snm>Zhu</snm><fnm>H</fnm></au><au><snm>Ruan</snm><fnm>J</fnm></au><au><snm>Qian</snm><fnm>W</fnm></au><au><snm>Fang</snm><fnm>X</fnm></au><au><snm>Shi</snm><fnm>Z</fnm></au><au><snm>Li</snm><fnm>Y</fnm></au><au><snm>Li</snm><fnm>S</fnm></au><au><snm>Shan</snm><fnm>G</fnm></au><au><snm>Kristiansen</snm><fnm>K</fnm></au><au><snm>Li</snm><fnm>S</fnm></au><au><snm>Yang</snm><fnm>H</fnm></au><au><snm>Wang</snm><fnm>J</fnm></au><au><snm>Wang</snm><fnm>J</fnm></au></aug><source>Bioinformatics</source><pubdate>2009</pubdate><volume>25</volume><fpage>1754</fpage><lpage>1760</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btp324</pubid><pubid idtype="pmcid">2705234</pubid><pubid idtype="pmpid" link="fulltext">19451168</pubid></pubidlist></xrefbib></bibl><bibl id="B21"><title><p>Initial sequencing and comparative analysis of the mouse genome</p></title><aug><au><cnm>Mouse genome sequencing consortium</cnm></au></aug><source>Nature</source><pubdate>2002</pubdate><volume>420</volume><fpage>520</fpage><lpage>562</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature01262</pubid><pubid idtype="pmpid" link="fulltext">12466850</pubid></pubidlist></xrefbib></bibl><bibl id="B22"><title><p>Database resources of the National Center for Biotechnology Information</p></title><aug><au><snm>Sayers</snm><fnm>EW</fnm></au><au><snm>Barrett</snm><fnm>T</fnm></au><au><snm>Benson</snm><fnm>DA</fnm></au><au><snm>Bolton</snm><fnm>E</fnm></au><au><snm>Bryant</snm><fnm>SH</fnm></au><au><snm>Canese</snm><fnm>K</fnm></au><au><snm>Chetvernin</snm><fnm>V</fnm></au><au><snm>Church</snm><fnm>DM</fnm></au><au><snm>DiCuccio</snm><fnm>M</fnm></au><au><snm>Federhen</snm><fnm>S</fnm></au><au><snm>Feolo</snm><fnm>M</fnm></au><au><snm>Fingerman</snm><fnm>IM</fnm></au><au><snm>Geer</snm><fnm>LY</fnm></au><au><snm>Helmberg</snm><fnm>W</fnm></au><au><snm>Kapustin</snm><fnm>Y</fnm></au><au><snm>Landsman</snm><fnm>D</fnm></au><au><snm>Lipman</snm><fnm>DJ</fnm></au><au><snm>Lu</snm><fnm>Z</fnm></au><au><snm>Madden</snm><fnm>TL</fnm></au><au><snm>Madej</snm><fnm>T</fnm></au><au><snm>Maglott</snm><fnm>DR</fnm></au><au><snm>Marchler-Bauer</snm><fnm>A</fnm></au><au><snm>Miller</snm><fnm>V</fnm></au><au><snm>Mizrachi</snm><fnm>I</fnm></au><au><snm>Ostell</snm><fnm>J</fnm></au><au><snm>Panchenko</snm><fnm>A</fnm></au><au><snm>Phan</snm><fnm>L</fnm></au><au><snm>Pruitt</snm><fnm>KD</fnm></au><au><snm>Schuler</snm><fnm>GD</fnm></au><au><cnm>Sequeira E</cnm></au><etal/></aug><source>Nuc Acids Res</source><pubdate>2011</pubdate><volume>39</volume><issue>suppl 1</issue><fpage>D38&#8212;D51</fpage></bibl><bibl id="B23"><title><p>Evaluation of paired-end sequencing strategies for detection of genome rearrangements in cancer</p></title><aug><au><snm>Bashir</snm><fnm>A</fnm></au><au><snm>Volik</snm><fnm>S</fnm></au><au><snm>Collins</snm><fnm>C</fnm></au><au><snm>Bafna</snm><fnm>V</fnm></au><au><snm>Raphael</snm><fnm>BJ</fnm></au></aug><source>PLoS Comput Biol</source><pubdate>2008</pubdate><volume>4</volume><fpage>e1000051</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pcbi.1000051</pubid><pubid idtype="pmcid">2278375</pubid><pubid idtype="pmpid" link="fulltext">18404202</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>A greedy algorithm for aligning DNA sequences</p></title><aug><au><snm>Zhang</snm><fnm>Z</fnm></au><au><snm>Schwartz</snm><fnm>S</fnm></au><au><snm>Wagner</snm><fnm>L</fnm></au><au><snm>Miller</snm><fnm>W</fnm></au></aug><source>J Comp Biol</source><pubdate>2000</pubdate><volume>7</volume><fpage>203</fpage><lpage>214</lpage><xrefbib><pubid idtype="doi">10.1089/10665270050081478</pubid></xrefbib></bibl><bibl id="B25"><title><p>BEDTools: a flexible suite of utilities for comparing genomic features</p></title><aug><au><snm>Quinlan</snm><fnm>AR</fnm></au><au><snm>Hall</snm><fnm>IM</fnm></au></aug><source>Bioinformatics</source><pubdate>2010</pubdate><volume>26</volume><issue>6</issue><fpage>841</fpage><lpage>842</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btq033</pubid><pubid idtype="pmcid">2832824</pubid><pubid idtype="pmpid" link="fulltext">20110278</pubid></pubidlist></xrefbib></bibl></refgrp>
	</bm>
</art>