<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-9-S10-P4</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Poster presentation</dochead>
      <bibl>
         <title>
            <p>Robust consensus computation</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Rausch</snm>
               <fnm>Tobias</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>rausch@mi.fu-berlin.de</email>
            </au>
            <au id="A2">
               <snm>Emde</snm>
               <fnm>Anne-Katrin</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
            </au>
            <au id="A3">
               <snm>Reinert</snm>
               <fnm>Knut</fnm>
               <insr iid="I2"/>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>International Max Planck Research School for Computational Biology and Scientific Computing, Ihnestr. 63-73, 14195 Berlin, Germany</p>
            </ins>
            <ins id="I2">
               <p>Algorithmische Bioinformatik, Institut f&#252;r Informatik, Takustr. 9, 14195 Berlin, Germany</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <supplement>
            <title>
               <p>Highlights from the Fourth International Society for Computational Biology (ISCB) Student Council Symposium</p>
            </title>
            <editor>Lucia Peixoto, Nils Gehlenborg and Sarath Chandra Janga</editor>
            <note>Meeting abstracts &#8211; A single PDF containing all abstracts in this Supplement is available <a href="http://www.biomedcentral.com/content/pdf/1471-2105-9-S10-full.pdf">here</a>.</note>
            <url>http://www.biomedcentral.com/content/pdf/1471-2105-9-S10-info.pdf</url>
         </supplement>
         <conference>
            <title>
               <p>Fourth International Society for Computational Biology (ISCB) Student Council Symposium</p>
            </title>
            <location>Toronto, Canada</location>
            <date-range>18 July 2008</date-range>
            <url>http://www.iscbsc.org</url>
         </conference>
         <issn>1471-2105</issn>
         <pubdate>2008</pubdate>
         <volume>9</volume>
         <issue>Suppl 10</issue>
         <fpage>P4</fpage>
         <url>http://www.biomedcentral.com/1471-2105/9/S10/P4</url>
         <xrefbib>
            <pubid idtype="doi">10.1186/1471-2105-9-S10-P4</pubid>
         </xrefbib>
      </bibl>
      <history>
         <pub>
            <date>
               <day>30</day>
               <month>10</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Rausch et al; licensee BioMed Central Ltd</collab>
      </cpyrt>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Introduction</p>
         </st>
         <p>High-throughput sequencing technologies with short read data pose a new challenge to the current three-phase assembly methodology: Overlap-Phase, Layout-Phase, and Consensus-Phase. We describe a new consensus method that is robust in the face of high coverage, shorter reads, and genomic variation.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <p>Given an initial layout of the reads, we generate a consensus sequence and a multi-read alignment with the following protocol: (1) Computation of all necessary (with respect to the layout) pairwise overlap alignments. (2) Extraction of all gapless alignment segments and generation of a segment-based weighted overlap graph (see Fig. <figr fid="F1">1</figr>). Conflicts between segment matches are resolved using a novel multiple segment match refinement algorithm <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. (3) An adjustment of the edge-weights using a variant of the triplet extension pioneered in the T-Coffee package <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. By means of the triplet extension we increase the weights of clique-edges within the overlap graph and thus, these edges are more likely to be chosen in the subsequent progressive alignment stage. (4) A progressive graph-based alignment of the reads using the heaviest common subsequence algorithm and a guide tree computed from the pairwise alignment scores. Note that our algorithm does not align single nucleotides but the segments identified in the overlap alignment phase. This ensures that columns with genetic variation (e.g., SNPs) are preserved. (5) Output of the multi-read alignment, the gapped consensus and all positioned reads with their respective deltas. The output can be visualized in Hawkeye <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> (see Fig. <figr fid="F2">2</figr>).</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>A segment-based alignment graph of three reads</p>
            </caption>
            <text>
               <p>A segment-based alignment graph of three reads. The green colored SNP is embedded in a segment and a clique is highlighted in bold font.</p>
            </text>
            <graphic file="1471-2105-9-S10-P4-1"/>
         </fig>
         <fig id="F2">
            <title>
               <p>Figure 2</p>
            </title>
            <caption>
               <p>Section of a multi-read alignment with an indel and 2 SNP columns "A/C" and "T/C"</p>
            </caption>
            <text>
               <p>Section of a multi-read alignment with an indel and 2 SNP columns "A/C" and "T/C". Read names and read orientation are shown on the left, the consensus is shown in the top row.</p>
            </text>
            <graphic file="1471-2105-9-S10-P4-2"/>
         </fig>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>We used a read simulator and real data from the NCBI trace archive to evaluate our consensus tool. The main parameters of the read simulator are the source sequence length, the average read length, the number of reads and the error rate per base call. In addition, multiple haplotypes can be simulated. Two further parameters, namely the number of SNPs and the number of indels, specify the genetic variation randomly introduced into these haplotypes. We performed two experiments: (1) Given a source sequence length of 10000, we simulated reads under different settings. The read length varied from 35 to 200, the coverage from 20&#215; to 50&#215; and the error rate per base call from 2% to 4%. In all cases, the computed gap-free consensus matched the simulated source sequence in each position with coverage > 2. (2) Given two haplotypes each of length 10000 with 100 SNPs and 5 Indels, we simulated reads of length 200, coverage 20 and 4% error rate. We then manually inspected the multi-read alignment with Hawkeye to evaluate the consensus in case of genetic variation (see Fig. <figr fid="F2">2</figr>).</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>The results on simulated data are encouraging and preliminary results on real data show that our consensus quality is comparable to other tools. It remains to be shown that our program outperforms other tools in diffficult settings, namely high coverage and short, error-prone read data. The consensus tool is part of the SeqAn library <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>&#160;<url>http://www.seqan.de</url> and the read simulator is available on request: <email>rausch@inf.fu-berlin.de</email>.</p>
      </sec>
   </bdy>
   <bm>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Segment-based multiple sequence alignment</p>
            </title>
            <aug>
               <au>
                  <snm>Rausch</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Emde</snm>
                  <fnm>AK</fnm>
               </au>
               <au>
                  <snm>Weese</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>D&#246;ring</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Notredame</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Reinert</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2008</pubdate>
            <volume>24</volume>
            <issue>16</issue>
            <fpage>i187</fpage>
            <lpage>192</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btn281</pubid>
                  <pubid idtype="pmpid" link="fulltext">18689823</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>T-Coffee: A Novel Method for Fast and Accurate Multiple Sequence Alignment</p>
            </title>
            <aug>
               <au>
                  <snm>Notredame</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Heringa</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Journal of Molecular Biology</source>
            <pubdate>2000</pubdate>
            <volume>302</volume>
            <fpage>205</fpage>
            <lpage>217</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2000.4042</pubid>
                  <pubid idtype="pmpid" link="fulltext">10964570</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Hawkeye: an interactive visual analytics tool for genome assemblies</p>
            </title>
            <aug>
               <au>
                  <snm>Schatz</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Phillippy</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Shneiderman</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genome Biology</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <issue>3</issue>
            <fpage>R34</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1868940</pubid>
                  <pubid idtype="pmpid" link="fulltext">17349036</pubid>
                  <pubid idtype="doi">10.1186/gb-2007-8-3-r34</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>SeqAn &#8211; An efficient, generic C++ library for sequence analysis</p>
            </title>
            <aug>
               <au>
                  <snm>D&#246;ring</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Weese</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Rausch</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Reinert</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2008</pubdate>
            <volume>9</volume>
            <fpage>11</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2246154</pubid>
                  <pubid idtype="pmpid" link="fulltext">18184432</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-9-11</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
