<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-8-159</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>A computational survey of candidate exonic splicing enhancer motifs in the model plant <it>Arabidopsis thaliana</it></p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Pertea</snm>
               <fnm>Mihaela</fnm>
               <insr iid="I1"/>
               <email>mpertea@umiacs.umd.edu</email>
            </au>
            <au id="A2">
               <snm>Mount</snm>
               <mi>M</mi>
               <fnm>Stephen</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>smount@umd.edu</email>
            </au>
            <au id="A3">
               <snm>Salzberg</snm>
               <mi>L</mi>
               <fnm>Steven</fnm>
               <insr iid="I1"/>
               <email>salzberg@umiacs.umd.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA</p>
            </ins>
            <ins id="I2">
               <p>Dept. of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20742, USA</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>1</issue>
         <fpage>159</fpage>
         <url>http://www.biomedcentral.com/1471-2105/8/159</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17517127</pubid>
               <pubid idtype="doi">10.1186/1471-2105-8-159</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>09</day>
               <month>10</month>
               <year>2006</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>21</day>
               <month>5</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>21</day>
               <month>5</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Pertea et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Algorithmic approaches to splice site prediction have relied mainly on the consensus patterns found at the boundaries between protein coding and non-coding regions. However exonic splicing enhancers have been shown to enhance the utilization of nearby splice sites.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We have developed a new computational technique to identify significantly conserved motifs involved in splice site regulation. First, 84 putative exonic splicing enhancer hexamers are identified in <it>Arabidopsis thaliana</it>. Then a Gibbs sampling program called ELPH was used to locate conserved motifs represented by these hexamers in exonic regions near splice sites in confirmed genes. Oligomers containing 35 of these motifs have been shown experimentally to induce significant inclusion of <it>A. thaliana </it>exons. Second, integration of our regulatory motifs into two different splice site recognition programs significantly improved the ability of the software to correctly predict splice sites in a large database of confirmed genes. We have released GeneSplicerESE, the improved splice site recognition code, as open source software.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Our results show that the use of the ESE motifs consistently improves splice site prediction accuracy.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Alternative splicing is an important regulatory mechanism for many species, allowing them to generate multiple variant proteins from the same primary transcript. In order to predict the complete protein complement of any eukaryote, we need to detect alternative splice sites and put them together in the correct combinations. Algorithmic approaches to splice site prediction have relied mainly on the consensus patterns found at the boundaries between protein coding and non-coding regions <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. However the sequence conservation found at the splice site junctions is not strong enough to accurately differentiate between introns and exons <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. Additional sequences, residing at variable distances from splice sites, have been shown to function as <it>cis</it>-acting factor binding sites that regulate splicing either <it>in vivo </it>or <it>in vitro</it>. Although such splicing regulators have been identified in both exons and introns, exonic splicing regulators (ESRs) are generally better characterized, and are probably more common <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. Such ESRs either enhance or suppress the utilization of both 5' and 3' splice sites. Much attention has been given to exonic splicing enhancers (ESEs) which promote the inclusion (as opposed to skipping) of the exons in which they reside. The first ESEs to be characterized were short, purine-rich motifs containing repeated GAR (GAA or GAG) trinucleotides, but subsequently many other sequences have been shown to have enhancer activity <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>.</p>
         <p>In animals, many exonic splicing enhancers are bound and activated by one or more of several related splicing factors known as SR proteins. The relationship between sequence-specific binding by SR proteins and the activation of splicing by exonic splicing enhancers is complex and incompletely understood. Although only a dozen or so splicing events have been shown to be enhancer-dependent, the existence of exonic splicing enhancers within constitutively spliced exons <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, the frequency of ESE motifs <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> and the absolute requirement for SR proteins by in-vitro splicing systems suggest that ESEs are ubiquitous, and required for all splicing events. It is estimated that as many as 15&#8211;20% of randomly appearing 20-mers contain a splicing enhancer <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> and computational methods have predicted hundreds of ESE motifs <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. Thus, it appears likely that many sequences may act to affect splicing. What is clear is that the motifs recognized by SR proteins are short (8 or fewer nucleotides) and degenerate <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp>.</p>
         <p>Several computational approaches have been undertaken to find the motifs characteristic of these splicing regulatory elements. In a recent study, Goren and colleagues <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> introduced a computational method that identifies ESRs based on conservation of wobble positions between orthologous human and mouse exons. Their method identified 285 putative ESRs, from which a sample of ten elements were shown experimentally to induce different levels of regulatory effects on alternative splicing. RESCUE-ESE, another computational approach, identifies potential ESEs based on the theory that exons with weak splice sites are more likely to require ESE activity for splicing <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. The original study identified 283 exonic hexamers that were significantly enriched both in human exons relative to introns and in exons with weak splice sites relative to exons with strong splice sites; <it>in vivo </it>tests of these hexamers confirmed ESE activity. In another study, Zhang and Chasin <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> predicted human ESR motifs by comparing the frequency of 8-mers in internal noncoding exons versus unspliced pseudo exons and 5' UTRs of transcripts of intronless genes.</p>
         <p>Previous computational work on detecting ESEs has focused almost exclusively on mammalian species. There are compelling reasons to believe that ESEs play an important role in plants as well. Early research on plant pre-mRNA splicing emphasized the role of AU-rich or U-rich sequences within introns <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>. These U-rich sequence elements play important roles in intron definition, and plants lack the very large introns that are associated with the need for exon definition <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. On the other hand, a number of reports describe a role for exon sequences in the selection of plant splice sites <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>. SR proteins, the mediators of ESE activity in vertebrates, are highly conserved in plants <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>. This pattern of conservation includes reactivity with the monoclonal antibody mAb104 <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> and extends to function. A mixture of Arabidopsis SR proteins <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, and atRSZp22 in particular <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> can complement SR-deficient mammalian splicing extracts. Furthermore, plant SR proteins can influence splice site choice in mammalian nuclear extracts <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, and can regulate alternative splicing <it>in planta </it><abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp>.</p>
         <p>The focus of this study is a new computational approach to identifying ESE motifs in the model plant <it>Arabidopsis thaliana</it>, and their use in improving splice site prediction accuracy. First we apply a similar approach to RESCUE-ESE to identify putative ESE hexamers in the flanking ends of a large set of known internal exons from <it>Arabidopsis</it>. Then we use a Gibbs sampling program called ELPH to identify statistically conserved motifs representing these hexamers in our data. In the end we show how these motifs can be used to improve splice site prediction. A significant improvement in specificity is obtained by incorporating the hexamer motifs into two leading splice site prediction programs, GeneSplicer <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> and SpliceMachine <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>.</p>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <sec>
            <st>
               <p>Data sets</p>
            </st>
            <p>Our ESE analyses were done on several high-confidence Arabidopsis data sets. The first set, ESEAra, was extracted from a set of very high-quality gene models obtained from 5000 full-length transcripts sequenced released in 2001 <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> (These sequences are at <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> and at GenBank as accession numbers <ext-link ext-link-type="gen" ext-link-id="AY084215">AY084215</ext-link>&#8211;<ext-link ext-link-type="gen" ext-link-id="AY089214">AY089214</ext-link>.) Because internal homology in the data set could influence the results, we refined this reference set of gene models by using BLAST <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> to perform pairwise alignments between all genes. Sequences that aligned for more than 80% of their length with a BLAST E-value of less than 10<sup>-10 </sup>were removed. The resulting ESEAra set includes 4046 genes containing of 17410 coding exons with an average length of 194 base pairs (bp). ESE motifs were determined on this data set.</p>
            <p>A second data set was used to evaluate the accuracy of SpliceMachine after introducing the ESE motifs found in ESEAra. This data set consists of the 1323 <it>A. thaliana </it>genes used previously in the evaluation of both GeneSplicer <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> and SpliceMachine <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp>. We will refer to this data set as GSAra.</p>
            <p>To test the accuracy of our splice site predictor outside the gene sequences, we collected one additional data set consisting only of intergenic regions situated between annotated <it>A. thaliana </it>genes. We used the highly curated, re-annotated <it>Arabidopsis </it>chromosome II sequence (available from <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>) and extracted regions located more than 500 nucleotides from any annotated genes. We called this data set INTAra.</p>
         </sec>
         <sec>
            <st>
               <p>ESE motifs</p>
            </st>
            <p>We identified a total of 84 potential ESE elements in the flanking regions of exons in the ESEAra data set [see Methods]. Out of these 84 ESEs, 44 tend to be overly represented at the 5'end, 18 at the 3 'end and 22 at both ends (results shown in TableS1 [see Additional file <supplr sid="S1">1</supplr>]). The predicted ESE candidates contained the two hexamers TGAAGA and TGAAGC, which are equally strongly represented by the motif found by ELPH in the 5'end data, but they did not contain the consensus of the motif predicted in the 3'end data (see Figure <figr fid="F1">1</figr>). To find the motifs that were represented by these ESE hexamers we ran ELPH using each of the 66 5' ESEs and 40 3'ESEs as input seeds on the 5' and 3' flanking ends respectively of the internal exons in ESEAra. Running ELPH in this way generated position weight matrixes for all 84 input seeds but only 73 of the ELPH motifs found (62 at the 5'exonic ends and 30 at the 3'exonic ends) were significantly conserved in the data (P-value &lt; 0.05).</p>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p>Hexamer motifs predicted as ESEs at the 5' (column 2) and 3' (column3) ends of internal exons from the ESEAra data set. Significance of the motif representation in the data (p-value) as computed by ELPH is shown for each predicted ESE, as well as an estimation of how larger is the frequency of selecting the motif in test vs. control sequences (mean rank), for 1000 sampling steps.</p>
               </text>
               <file name="1471-2105-8-159-S1.xls">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Sequence logos for motifs detected in the ESEAra exons</p>
               </caption>
               <text>
                  <p><b>Sequence logos for motifs detected in the ESEAra exons</b>. a) Motif detected at the 5'end of ESEAra exons, and b) motif detected at the 3'end of ESEAra exons. Both logos were computed with WebLogo [45].</p>
               </text>
               <graphic file="1471-2105-8-159-1"/>
            </fig>
            <p>ESE activity has been shown for several of the hexamers identified <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. Out of the 84 hexamer motifs we identified as putative ESE elements, 35 (12 at the 5'end, 6 at the 3' and 17 at both ends) are included in a set of experimentally confirmed 9-mers that function as exonic splicing enhancers in <it>A. thaliana </it>(results shown in Table <tblr tid="T1">1</tblr> and TableS1 [see Additional file <supplr sid="S1">1</supplr>]). Most significantly, for 8 of these 25 9-mers, mutation of one base (in one or two of our predicted ESE hexamers that are contained within that 9-mer) resulted in reduced ESE activity for the mutant ninemer (Table <tblr tid="T1">1</tblr>). It is also worth noting that the GAAGAA hexamer, the highest scoring ESE motif identified by our method, has long been known to function (as part of the 9-mer GAAGAAGAA) as an exon splicing enhancer in humans <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Experimental evidence for predicted ESE hexamers.</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>9mer ESE</p>
                     </c>
                     <c ca="left">
                        <p>ESE Score</p>
                     </c>
                     <c ca="left">
                        <p>Mutant ESE</p>
                     </c>
                     <c ca="left">
                        <p>Mutant Score</p>
                     </c>
                     <c ca="left">
                        <p>Contained Hexamer Motifs</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GAAGAAGAA</p>
                     </c>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>GCAGAAAAA</p>
                     </c>
                     <c ca="left">
                        <p>-1</p>
                     </c>
                     <c ca="left">
                        <p>gaagaa, aagaag</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>TGCTGCTGG</p>
                     </c>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>tgctgc, gctgct</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>TGCAGCTGG</p>
                     </c>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>gcagct, cagctg</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GAAGATGGA</p>
                     </c>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>gaagat, aagatg, gatgga</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GAAGGAAGA</p>
                     </c>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>gaagga, aaggaa, ggaaga</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GAGAAGAAG</p>
                     </c>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>gagaag, gaagaa, aagaag</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>TTGGAGCAA</p>
                     </c>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>ttggag, ggagca</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>AGCTGCTGG</p>
                     </c>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>agctgc, gctgct</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>TGCTGGTGG</p>
                     </c>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>tggtgg</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>TGCTGCAGG</p>
                     </c>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>tgctgc, ctgcag</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>TGCTGCTCG</p>
                     </c>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>tgctgc, gctgct</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>TGCTGCTGC</p>
                     </c>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c ca="left">
                        <p>TACTTCTGC</p>
                     </c>
                     <c ca="left">
                        <p>-3</p>
                     </c>
                     <c ca="left">
                        <p>tgctgc, gctgct</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GAGGATTGA</p>
                     </c>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c ca="left">
                        <p>GAGAATTGA</p>
                     </c>
                     <c ca="left">
                        <p>-1</p>
                     </c>
                     <c ca="left">
                        <p>gaggat</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>TGCAGATGA</p>
                     </c>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>gcagat, cagatg</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>CAAGAAACA</p>
                     </c>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>aagaaa</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GAAGAGAAA</p>
                     </c>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c ca="left">
                        <p>GCAGAAAAA</p>
                     </c>
                     <c ca="left">
                        <p>-1</p>
                     </c>
                     <c ca="left">
                        <p>aagaga</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>AAAGGAGAT</p>
                     </c>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>aaggag, aggaga, ggagat</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GAAGAAAGA</p>
                     </c>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>gaagaa, aagaaa</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GAGCAGAAG</p>
                     </c>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>gagcag</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>TGCTGCCGC</p>
                     </c>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>tgctgc</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>TTGAAGAAG</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>TTGAAAAAG</p>
                     </c>
                     <c ca="left">
                        <p>-3</p>
                     </c>
                     <c ca="left">
                        <p>ttgaag, tgaaga, gaagaa, aagaag</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>TTGAAGCTG</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>TTAAAGCTG</p>
                     </c>
                     <c ca="left">
                        <p>-3</p>
                     </c>
                     <c ca="left">
                        <p>ttgaag, tgaagc, gaagct, aagctg</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GAAGATTGA</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>GAGAATTGA</p>
                     </c>
                     <c ca="left">
                        <p>-1</p>
                     </c>
                     <c ca="left">
                        <p>gaagat</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>TTTGGTGGA</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>tggtgg, ggtgga</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>ATGGAGAAA</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>ATTGAGAAA</p>
                     </c>
                     <c ca="left">
                        <p>-3</p>
                     </c>
                     <c ca="left">
                        <p>atggag, tggaga, ggagaa</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Hexamer motifs that are contained within experimentally confirmed 9-mers with ESE activity (column 5). Experiments to confirm 9-mers are described elsewhere (S. Mount et al., manuscript in preparation). Column 1 shows the containing ESE ninemer, and column 3 shows ninemers without ESE activity, which are situated within 1&#8211;2 bp edit distance from the ESE ninemer. The ESE activity of each 9-mer in the table is shown by a score equal to log<sub>2</sub>(inclusion/skipping) [34].</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Splice site prediction</p>
            </st>
            <p>As mentioned above, several recent studies have described computational methods for identification of ESR elements. However few attempts have been made to improve splice site prediction by using these elements; one exception is a method for exon prediction that uses ESEs and ESSs <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. One of the goals of our study was to provide a way to integrate the motifs predicted as potential ESEs into splice site prediction programs, in particular GeneSplicer. We used the 84 putative ESE motifs found by ELPH (66 for the 5'end and 40 for the 3'end, 22 of which appear at both ends) and the corresponding splice site score predicted by GeneSplicer as features in a linear support vector machine (LSVM). The LSVM created this way was integrated in the new splice site prediction system GeneSplicerESE.</p>
            <p>To evaluate the splice site prediction accuracy of GeneSplicerESE, we applied a 5-fold cross-validation procedure on the ESEAra data set: the data were partitioned into 5 non-overlapping subsets, and each subset was held out separately while the system was trained on the remaining 4. Training included all positive examples, and 50,000 randomly selected negative examples. As negative examples we considered all dinucleotides in the ESEAra data set that matched the consensus splice site (AG for acceptors, and GT for donors), but did not overlap the confirmed splice sites. Accuracy was then measured on all positive and negative examples from the held out data. All motif position weight matrixes were recomputed on 50 bp flanking exonic sequences from the training data, but the length for the flanking sequence involved in equation (2) [see Methods] was chosen between 45 and 80 bp. The optimal length of this flanking region was chosen for each splice site by applying a 5-fold cross-validation procedure on the training data. Complete sensitivity vs. specificity plots for the original GeneSplicer and GeneSplicerESE on this data are shown in Figure <figr fid="F2">2</figr>. A significant increase in accuracy of GeneSplicerESE vs. GeneSplicer can be observed for both splice sites, with somewhat larger advantages occurring for acceptor sites. At the 95% sensitivity threshold (a threshold often used in splice site prediction), the false positive rate of GeneSplicerESE is 2.9% at the acceptor sites while GeneSplicer's false positive rate is 4% (Table <tblr tid="T2">2</tblr>). For donor sites a 5% false negative rate (equal to 95% sensitivity) corresponds to 2.2% and 2.9% false positive rates for GenesplicerESE and GeneSplicer respectively (Table <tblr tid="T3">3</tblr>).</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>False negative (FN) vs. false positive (FP) rates on test and intergenic data sets for acceptor sites</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>FN(%)</p>
                     </c>
                     <c cspan="4" ca="center">
                        <p>FP(%)</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>GS-test</p>
                     </c>
                     <c ca="center">
                        <p>GS-intg</p>
                     </c>
                     <c ca="center">
                        <p>GSESE-test</p>
                     </c>
                     <c ca="center">
                        <p>GSESE-intg</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0.5</p>
                     </c>
                     <c ca="center">
                        <p>14.27</p>
                     </c>
                     <c ca="center">
                        <p>29.58</p>
                     </c>
                     <c ca="center">
                        <p>12.47</p>
                     </c>
                     <c ca="center">
                        <p>20.67</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>10.03</p>
                     </c>
                     <c ca="center">
                        <p>23.39</p>
                     </c>
                     <c ca="center">
                        <p>8.09</p>
                     </c>
                     <c ca="center">
                        <p>15.74</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>7.11</p>
                     </c>
                     <c ca="center">
                        <p>18.51</p>
                     </c>
                     <c ca="center">
                        <p>5.80</p>
                     </c>
                     <c ca="center">
                        <p>11.30</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>5.64</p>
                     </c>
                     <c ca="center">
                        <p>15.76</p>
                     </c>
                     <c ca="center">
                        <p>4.21</p>
                     </c>
                     <c ca="center">
                        <p>9.00</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>4.00</p>
                     </c>
                     <c ca="center">
                        <p>12.41</p>
                     </c>
                     <c ca="center">
                        <p>2.94</p>
                     </c>
                     <c ca="center">
                        <p>6.56</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>3.13</p>
                     </c>
                     <c ca="center">
                        <p>10.43</p>
                     </c>
                     <c ca="center">
                        <p>2.18</p>
                     </c>
                     <c ca="center">
                        <p>5.20</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>2.32</p>
                     </c>
                     <c ca="center">
                        <p>8.41</p>
                     </c>
                     <c ca="center">
                        <p>1.62</p>
                     </c>
                     <c ca="center">
                        <p>4.01</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>15</p>
                     </c>
                     <c ca="center">
                        <p>1.55</p>
                     </c>
                     <c ca="center">
                        <p>6.20</p>
                     </c>
                     <c ca="center">
                        <p>1.05</p>
                     </c>
                     <c ca="center">
                        <p>2.74</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>1.10</p>
                     </c>
                     <c ca="center">
                        <p>4.86</p>
                     </c>
                     <c ca="center">
                        <p>0.71</p>
                     </c>
                     <c ca="center">
                        <p>2.01</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Rates on test data are obtained from a 5-fold CV procedure on the ESEAra data set, while FP rates on intergenic data are averages of the FP rates obtained on INTAra by setting a threshold that would produce the same FN rate on each of the 5 fold test data.</p>
               </tblfn>
            </tbl>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>False negative (FN) vs. false positive (FP) rates on test and intergenic data sets for donor sites</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>FN(%)</p>
                     </c>
                     <c cspan="4" ca="center">
                        <p>FP(%)</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>GS-test</p>
                     </c>
                     <c ca="center">
                        <p>GS-intg</p>
                     </c>
                     <c ca="center">
                        <p>GSESE-test</p>
                     </c>
                     <c ca="center">
                        <p>GSESE-intg</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0.5</p>
                     </c>
                     <c ca="center">
                        <p>11.06</p>
                     </c>
                     <c ca="center">
                        <p>17.99</p>
                     </c>
                     <c ca="center">
                        <p>9.11</p>
                     </c>
                     <c ca="center">
                        <p>12.84</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>7.58</p>
                     </c>
                     <c ca="center">
                        <p>13.11</p>
                     </c>
                     <c ca="center">
                        <p>6.24</p>
                     </c>
                     <c ca="center">
                        <p>9.35</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>5.33</p>
                     </c>
                     <c ca="center">
                        <p>9.75</p>
                     </c>
                     <c ca="center">
                        <p>4.10</p>
                     </c>
                     <c ca="center">
                        <p>6.34</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>4.21</p>
                     </c>
                     <c ca="center">
                        <p>7.99</p>
                     </c>
                     <c ca="center">
                        <p>3.25</p>
                     </c>
                     <c ca="center">
                        <p>5.08</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>2.94</p>
                     </c>
                     <c ca="center">
                        <p>5.86</p>
                     </c>
                     <c ca="center">
                        <p>2.20</p>
                     </c>
                     <c ca="center">
                        <p>3.77</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>2.22</p>
                     </c>
                     <c ca="center">
                        <p>4.65</p>
                     </c>
                     <c ca="center">
                        <p>1.62</p>
                     </c>
                     <c ca="center">
                        <p>2.95</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>1.61</p>
                     </c>
                     <c ca="center">
                        <p>3.58</p>
                     </c>
                     <c ca="center">
                        <p>1.15</p>
                     </c>
                     <c ca="center">
                        <p>2.27</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>15</p>
                     </c>
                     <c ca="center">
                        <p>1.03</p>
                     </c>
                     <c ca="center">
                        <p>2.48</p>
                     </c>
                     <c ca="center">
                        <p>0.74</p>
                     </c>
                     <c ca="center">
                        <p>1.58</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>0.73</p>
                     </c>
                     <c ca="center">
                        <p>1.86</p>
                     </c>
                     <c ca="center">
                        <p>0.52</p>
                     </c>
                     <c ca="center">
                        <p>1.20</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>(b) Rates on test data are obtained from a 5-fold CV procedure on the ESEAra data set, while FP rates on intergenic data are averages of the FP rates obtained on INTAra by setting a threshold that would produce the same FN rate on each of the 5 fold test data.</p>
               </tblfn>
            </tbl>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Sensitivity versus specificity rates for GeneSplicer and GeneSplicerESE</p>
               </caption>
               <text>
                  <p><b>Sensitivity versus specificity rates for GeneSplicer and GeneSplicerESE</b>. Sensitivity is defined as the fraction of all true splice sites found by the splice site predictor; specificity is the fraction of the predicted elements labelled correctly as splice sites. Rates are shown for a) donor sites (GS don and GSESE don), and b) acceptor sites (GS acc and GSESE acc). Results are obtained using a 5-fold cross-validation procedure on the ESEAra data set. Weight matrices for the selected motifs to describe each of the splice sites were recomputed on each training data set from the 5 partitions of the CV procedure.</p>
               </text>
               <graphic file="1471-2105-8-159-2"/>
            </fig>
            <p>Since the putative ESE motifs were identified from hexamers that more frequently appear near weak splice sites than strong splice sites, it is likely that the improvement in accuracy obtained by GenesplicerESE is due primarily to an improvement in weak splice site recognition. Our results show that, with the addition of ESEs, we recover ~20% of all the weak splice sites of either type (acceptor or donor) that were missed previously (assuming a threshold of 25% false negatives). Figure <figr fid="F3">3</figr> shows that the main contributor to GeneSplicerESE's improved prediction accuracy is its better performance on weak splice sites. Almost all of the false positives that are eliminated by use of GeneSplicerESE rather than GeneSplicer are associated with weak splice sites and this is true across a range of false negative rates.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>The contribution of weak splice sites to GeneSplicerESE's performance</p>
               </caption>
               <text>
                  <p><b>The contribution of weak splice sites to GeneSplicerESE's performance</b>. For each threshold that would produce a false negative rate over all splice sites in the test data, we show the difference between the number of false positives that are predicted by GeneSplicer versus GeneSplicerESE. The red plot shows this value for all splice sites, while the green plot shows it for weak splice sites only. See Methods for definition of weak sites. (a) donor sites; (b) acceptor sites.</p>
               </text>
               <graphic file="1471-2105-8-159-3"/>
            </fig>
            <p>Our experience with GeneSplicer revealed larger false positive rates on intergenic data than on sequences containing coding genes. By using our predicted ESE elements we hoped that these false positive rates could be decreased in GeneSplicerESE. Indeed GeneSplicerESE's false positive rates are significantly reduced on the INTAra data set, even more than on the ESEAra data set, probably due to the fact that the predicted ESE elements are more likely encountered into coding regions. At a threshold corresponding to a 5% false negative rate on the ESEAra data set, the acceptor sites' false positive rate for INTAra is almost twice as big in GeneSplicer vs. GeneSplicerESE (12.4% vs. 6.6%, Table <tblr tid="T2">2</tblr>), and significantly bigger at the donor sites (5.9% vs. 3.8%, Table <tblr tid="T3">3</tblr>).</p>
            <p>Our efforts to improve splice site prediction by introducing putative ESE scores have been focused on improving our previously developed splice site predictor, GeneSplicer. The method we used here can equally well be adapted to improve other splice site prediction programs. As an example, SpliceMachineESE is a splice site predictor that we created by adding the ESE motif scores to the set of features used by SpliceMachine <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. We downloaded SpliceMachine from the authors' website <abbrgrp><abbr bid="B37">37</abbr></abbrgrp> and trained it using the same procedure as the one described by the original authors: a sub-sample of 1000 actual and 10000 pseudo-sites was used to obtain the optimal context sizes for all features, and then a linear SVM was trained on the complete training data set. Our training of SpliceMachine on the GSAra data set revealed false positive rates comparable to the ones previously published (ours were less than 0.1% bigger). Table <tblr tid="T4">4</tblr> shows the previously reported false positive rates on the GSAra data set <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> compared to the ones we obtained for SpliceMachineESE. Even though SpliceMachine captures both positional and compositional information at all positions in large windows (at least 60 bp) around splice sites, we were still able to decrease its false positive rates (Table <tblr tid="T4">4</tblr>). At 95% sensitivity the false positive rate dropped from 2.1% to 1.8% for donor sites and from 2.7% to 2.4% for acceptor sites.</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>False positive rates obtained by SpliceMachine and SpliceMachineESE on the GSAra data set</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>Sn</p>
                     </c>
                     <c cspan="4" ca="center">
                        <p>FP%</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2" ca="center">
                        <p>Donors</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>Acceptors</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>SpliceMachine</p>
                     </c>
                     <c ca="left">
                        <p>SpliceMachineESE</p>
                     </c>
                     <c ca="left">
                        <p>SpliceMachine</p>
                     </c>
                     <c ca="left">
                        <p>SpliceMachineESE</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0.97</p>
                     </c>
                     <c ca="left">
                        <p>3.2</p>
                     </c>
                     <c ca="left">
                        <p>3.1</p>
                     </c>
                     <c ca="left">
                        <p>4.7</p>
                     </c>
                     <c ca="left">
                        <p>4.5</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0.95</p>
                     </c>
                     <c ca="left">
                        <p>2.1</p>
                     </c>
                     <c ca="left">
                        <p>1.8</p>
                     </c>
                     <c ca="left">
                        <p>2.7</p>
                     </c>
                     <c ca="left">
                        <p>2.4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0.93</p>
                     </c>
                     <c ca="left">
                        <p>1.5</p>
                     </c>
                     <c ca="left">
                        <p>1.3</p>
                     </c>
                     <c ca="left">
                        <p>1.8</p>
                     </c>
                     <c ca="left">
                        <p>1.7</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0.92</p>
                     </c>
                     <c ca="left">
                        <p>1.3</p>
                     </c>
                     <c ca="left">
                        <p>1.2</p>
                     </c>
                     <c ca="left">
                        <p>1.6</p>
                     </c>
                     <c ca="left">
                        <p>1.5</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0.90</p>
                     </c>
                     <c ca="left">
                        <p>1.0</p>
                     </c>
                     <c ca="left">
                        <p>0.9</p>
                     </c>
                     <c ca="left">
                        <p>1.2</p>
                     </c>
                     <c ca="left">
                        <p>1.1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0.85</p>
                     </c>
                     <c ca="left">
                        <p>0.6</p>
                     </c>
                     <c ca="left">
                        <p>0.5</p>
                     </c>
                     <c ca="left">
                        <p>0.8</p>
                     </c>
                     <c ca="left">
                        <p>0.7</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0.80</p>
                     </c>
                     <c ca="left">
                        <p>0.4</p>
                     </c>
                     <c ca="left">
                        <p>0.4</p>
                     </c>
                     <c ca="left">
                        <p>0.5</p>
                     </c>
                     <c ca="left">
                        <p>0.4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0.70</p>
                     </c>
                     <c ca="left">
                        <p>0.2</p>
                     </c>
                     <c ca="left">
                        <p>0.2</p>
                     </c>
                     <c ca="left">
                        <p>0.3</p>
                     </c>
                     <c ca="left">
                        <p>0.2</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The false positive rates for SpliceMachine are copied from [29].</p>
               </tblfn>
            </tbl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>In this study we identified 84 potential ESE hexamers in the flanking regions of internal coding exons from a large set of high confidence <it>Arabidosis thaliana </it>genes. These 84 ESEs were used to generate motifs with a Gibbs sampling program called ELPH. We believe these motifs to be important in splice site regulation. 35 of them have subsequently been validated experimentally to show ESE activity. We have incorporated these motifs into two splice site prediction methods and shown that they lead to an increase in accuracy for both programs.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Finding ESE hexamers</p>
            </st>
            <p>Many studies suggest that ESEs are present in the vicinity of splice sites. ESE activity falls off sharply with distance <abbrgrp><abbr bid="B38">38</abbr></abbrgrp> and natural internal exons tend to be small <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. We therefore focused our search for ESEs in the regions near the ends of exons, and we also focused on internal exons (those with introns on either side). We extracted regions of 50 bp from either end of all internal exons in the ESEAra data set, and then we identified potential ESE hexamers in these regions by using the same assumptions as the RESCUE-ESE algorithm <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. RESCUE-ESE assumes that ESEs are represented by hexamers with both (1) a significantly higher frequency in exons than in introns and (2) a significantly higher frequency in exons with weak splice sites (also called weak exons) than in exons with strong splice sites (strong exons). To find ESEs based on these assumptions, we define "weak" splice sites as those scoring in the bottom 25% according to GeneSplicer, and "strong" splice sites as those among the top 25%. Similarly to RESCUE-ESE, we compute for each type of splice site two differences: one between the frequency of occurrence of a given hexamer <it>h </it>in exons (<inline-formula><m:math name="1471-2105-8-159-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>f</m:mi><m:mi>E</m:mi><m:mi>h</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGMbGzdaqhaaWcbaGaemyraueabaGaemiAaGgaaaaa@309A@</m:annotation></m:semantics></m:math></inline-formula>) and the frequency of occurrence near splice sites (within 50 bp) in introns (<inline-formula><m:math name="1471-2105-8-159-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>f</m:mi><m:mi>I</m:mi><m:mi>h</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGMbGzdaqhaaWcbaGaemysaKeabaGaemiAaGgaaaaa@30A2@</m:annotation></m:semantics></m:math></inline-formula>) and the other between the frequency of occurrence of the hexamer in weak exons (<inline-formula><m:math name="1471-2105-8-159-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>f</m:mi><m:mi>W</m:mi><m:mi>h</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGMbGzdaqhaaWcbaGaem4vaCfabaGaemiAaGgaaaaa@30BE@</m:annotation></m:semantics></m:math></inline-formula>) and its frequency in strong exons (<inline-formula><m:math name="1471-2105-8-159-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>f</m:mi><m:mi>S</m:mi><m:mi>h</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGMbGzdaqhaaWcbaGaem4uamfabaGaemiAaGgaaaaa@30B6@</m:annotation></m:semantics></m:math></inline-formula>). The two distributions {<inline-formula><m:math name="1471-2105-8-159-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>f</m:mi><m:mi>E</m:mi><m:mi>h</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGMbGzdaqhaaWcbaGaemyraueabaGaemiAaGgaaaaa@309A@</m:annotation></m:semantics></m:math></inline-formula> - <inline-formula><m:math name="1471-2105-8-159-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>f</m:mi><m:mi>I</m:mi><m:mi>h</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGMbGzdaqhaaWcbaGaemysaKeabaGaemiAaGgaaaaa@30A2@</m:annotation></m:semantics></m:math></inline-formula><it>&#8739;h </it>&#8712; <it>all possible hexamers</it>}, and {<inline-formula><m:math name="1471-2105-8-159-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>f</m:mi><m:mi>W</m:mi><m:mi>h</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGMbGzdaqhaaWcbaGaem4vaCfabaGaemiAaGgaaaaa@30BE@</m:annotation></m:semantics></m:math></inline-formula> - <inline-formula><m:math name="1471-2105-8-159-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>f</m:mi><m:mi>S</m:mi><m:mi>h</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGMbGzdaqhaaWcbaGaem4uamfabaGaemiAaGgaaaaa@30B6@</m:annotation></m:semantics></m:math></inline-formula><it>&#8739;h </it>&#8712; <it>all possible hexamers</it>} are then computed, and only those hexamers that score above a given threshold (defined in terms of standard deviations above the mean) in each of these two distributions are selected. For our A. thaliana data, we set this threshold to 1.5, which identifies ~1% of all hexamers. For other species this threshold is likely to vary, depending on the relative strength of the splice site signals.</p>
         </sec>
         <sec>
            <st>
               <p>ELPH: Estimated-Location-of-Pattern-Hits</p>
            </st>
            <p>ELPH is a Gibbs sampling program to identify motifs present in the flanking regions of exons. Gibbs sampling has proven successful in several previous computational methods to discover motifs in regulatory sequences <abbrgrp><abbr bid="B39">39</abbr><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr></abbrgrp>, although none of these previous systems focused on ESRs. ELPH takes as input a set of DNA sequences and searches through them for the most common motif. The set may contain up to several thousand sequences, and the sequences can be very short or can be thousands of nucleotides long. The algorithm's success depends on most of the sequences containing at least one copy of the motif. ELPH is freely available under an open source license from <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>.</p>
            <p>The implementation of the Gibbs sampling technique in ELPH is based on the algorithm previously described by Neuwald et al. <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. The algorithm starts by randomly choosing a motif position in each of the input sequences. These motif positions are used to compute an initial weighted probability matrix (a position weight matrix, or pwm) describing the motif. After this initialization step, the program iteratively runs through two main steps: predictive update and sampling. In the predictive update step, one sequence from the input file is selected, beginning with the first sequence and proceeding to the last one. The motif element from that sequence is added to the background and the pwm is updated accordingly. In the sampling step, the pwm is used to assign each position in the given sequence a probability, representing the likelihood that the motif starts at that position. A motif element is assigned to the sequence by performing a weighted sample from all the possible motif positions in the sequence. These two steps are repeated until a local maximum is reached or until a pre-defined maximum number of iterations are made. The Gibbs sampler is restarted several times with different random initial conditions in order to avoid local maxima.</p>
            <p>We ran ELPH in this fashion (as a motif detector) on the ESEAra data, looking separately at the first 50 bp (the 5' end) and the last 50 bp (the 3' end) of all exons. ELPH identified the motif TGAAGA in the 5' data and [T|C]TTC [A|C]T in the 3' data. Logos of this motifs created using WebLogo <abbrgrp><abbr bid="B45">45</abbr></abbrgrp> are shown in Figure <figr fid="F1">1</figr>.</p>
            <p>Another way to run ELPH is to use an input pattern as a seed. In this case the sampling step is restricted to those positions in the sequence that are close to the seed pattern. This strategy significantly constraints the search space and the output will contain the motif that best matches the input pattern.</p>
            <p>Similar to Neuwald et al. <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>, ELPH can estimate the statistical significance of any predicted motif using the Wilcoxon signed-rank test. A control set of sequences with the same background composition as the input sequences is generated using a first-order Markov model. A control sequence with the same length is appended to each sequence in the input set, and then the weighted probability matrix representing the motif is used to sample positions in the combined sequences. If the motif is a real one, then one expects the algorithm to find it in the original sequence much more often than in the random control sequence. After repeating this sampling process many times, a rank is associated to the chosen motif sites according to the frequency they have been selected. If the selected sites are from the original sequence than this rank is positive, otherwise if they fall within the control sequences the assigned rank is negative. Under the null hypothesis, the mean rank of the selected sites is expected to be zero, but largely positive if a statistically significant motif is found.</p>
         </sec>
         <sec>
            <st>
               <p>GeneSplicerESE</p>
            </st>
            <p>Recent studies show that support vector machines <abbrgrp><abbr bid="B46">46</abbr></abbrgrp> represent a state-of-the-art classification method for the splice site recognition task <abbrgrp><abbr bid="B29">29</abbr><abbr bid="B47">47</abbr></abbrgrp>. Based on a linear support vector machine (LSVM), we built a new splice site predictor called GeneSplicerESE. The LSVM is a binary classification technique which separates the input data points from a class <it>X </it>&#8838; &#8476;<sup><it>n </it></sup>by building a hyperplane with maximum distance to the closest data point from both classes (see <abbrgrp><abbr bid="B48">48</abbr></abbrgrp> for more details). A new data point <it>x </it>&#8712; <it>X </it>is classified into {&#177; 1} according to the following decision function:</p>
            <p>
               <display-formula id="M1"><it>f </it>(<it>x</it>) = sgn (<it>wx </it>+ <it>b</it>)</display-formula>
            </p>
            <p>where the pair {<it>w </it>&#8712; &#8476;<sup><it>n</it></sup>, <it>b </it>&#8712; &#8476;} describe the separating hyperplane.</p>
            <p>GeneSplicerESE represents each candidate splice site by a feature vector consisting of the splice site score computed by GeneSplicer as described in <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>, and a set of <it>n </it>motif scores computed according to the following formula:</p>
            <p>
               <display-formula id="M2">
                  <m:math name="1471-2105-8-159-i5" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>S</m:mi>
                           <m:mi>c</m:mi>
                           <m:mi>o</m:mi>
                           <m:mi>r</m:mi>
                           <m:mi>e</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>s</m:mi>
                           <m:mo>,</m:mo>
                           <m:mi>m</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:msub>
                              <m:mrow>
                                 <m:mi>max</m:mi>
                                 <m:mo>&#8289;</m:mo>
                              </m:mrow>
                              <m:mrow>
                                 <m:mi>i</m:mi>
                                 <m:mo>=</m:mo>
                                 <m:mn>1</m:mn>
                                 <m:mo>,</m:mo>
                                 <m:mi>l</m:mi>
                                 <m:mi>e</m:mi>
                                 <m:mi>n</m:mi>
                                 <m:mi>g</m:mi>
                                 <m:mi>t</m:mi>
                                 <m:mi>h</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>s</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mi>l</m:mi>
                                 <m:mi>e</m:mi>
                                 <m:mi>n</m:mi>
                                 <m:mi>g</m:mi>
                                 <m:mi>t</m:mi>
                                 <m:mi>h</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>m</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo>+</m:mo>
                                 <m:mn>1</m:mn>
                              </m:mrow>
                           </m:msub>
                           <m:mo>{</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munderover>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>j</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mo>+</m:mo>
                                    <m:mi>l</m:mi>
                                    <m:mi>e</m:mi>
                                    <m:mi>n</m:mi>
                                    <m:mi>g</m:mi>
                                    <m:mi>t</m:mi>
                                    <m:mi>h</m:mi>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>m</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                 </m:mrow>
                              </m:munderover>
                              <m:mrow>
                                 <m:msubsup>
                                    <m:mi>P</m:mi>
                                    <m:mi>m</m:mi>
                                    <m:mrow>
                                       <m:mi>j</m:mi>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mi>i</m:mi>
                                       <m:mo>+</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                 </m:msubsup>
                              </m:mrow>
                           </m:mstyle>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msub>
                              <m:mi>s</m:mi>
                              <m:mi>j</m:mi>
                           </m:msub>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mi>log</m:mi>
                           <m:mo>&#8289;</m:mo>
                           <m:mrow>
                              <m:mo>(</m:mo>
                              <m:mrow>
                                 <m:msubsup>
                                    <m:mi>P</m:mi>
                                    <m:mi>m</m:mi>
                                    <m:mrow>
                                       <m:mi>j</m:mi>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mi>i</m:mi>
                                       <m:mo>+</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                 </m:msubsup>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:msub>
                                    <m:mi>s</m:mi>
                                    <m:mi>j</m:mi>
                                 </m:msub>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo>/</m:mo>
                                 <m:msub>
                                    <m:mi>P</m:mi>
                                    <m:mi>b</m:mi>
                                 </m:msub>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:msub>
                                    <m:mi>s</m:mi>
                                    <m:mi>j</m:mi>
                                 </m:msub>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                              <m:mo>)</m:mo>
                           </m:mrow>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGtbWucqWGJbWycqWGVbWBcqWGYbGCcqWGLbqzcqGGOaakcqWGZbWCcqGGSaalcqWGTbqBcqGGPaqkcqGH9aqpcyGGTbqBcqGGHbqycqGG4baEdaWgaaWcbaGaemyAaKMaeyypa0JaeGymaeJaeiilaWIaemiBaWMaemyzauMaemOBa4Maem4zaCMaemiDaqNaemiAaGMaeiikaGIaem4CamNaeiykaKIaeyOeI0IaemiBaWMaemyzauMaemOBa4Maem4zaCMaemiDaqNaemiAaGMaeiikaGIaemyBa0MaeiykaKIaey4kaSIaeGymaedabeaakiabcUha7naaqahabaGaemiuaa1aa0baaSqaaiabd2gaTbqaaiabdQgaQjabgkHiTiabdMgaPjabgUcaRiabigdaXaaaaeaacqWGQbGAcqGH9aqpcqWGPbqAaeaacqWGPbqAcqGHRaWkcqWGSbaBcqWGLbqzcqWGUbGBcqWGNbWzcqWG0baDcqWGObaAcqGGOaakcqWGTbqBcqGGPaqka0GaeyyeIuoakiabcIcaOiabdohaZnaaBaaaleaacqWGQbGAaeqaaOGaeiykaKIagiiBaWMaei4Ba8Maei4zaC2aaeWaaeaacqWGqbaudaqhaaWcbaGaemyBa0gabaGaemOAaOMaeyOeI0IaemyAaKMaey4kaSIaeGymaedaaOGaeiikaGIaem4Cam3aaSbaaSqaaiabdQgaQbqabaGccqGGPaqkcqGGVaWlcqWGqbaudaWgaaWcbaGaemOyaigabeaakiabcIcaOiabdohaZnaaBaaaleaacqWGQbGAaeqaaOGaeiykaKcacaGLOaGaayzkaaaaaa@9879@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <it>s </it>represents a flanking region of an exon (either the 5' or 3' exonic end depending if acceptor or donor sites are classified), <it>m </it>is a motif predicted by ELPH, <it>S</it><sub><it>j </it></sub>is the nucleotide at position <it>j </it>in sequence <it>s</it>, <inline-formula><m:math name="1471-2105-8-159-i6" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>P</m:mi><m:mi>m</m:mi><m:mi>k</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGqbaudaqhaaWcbaGaemyBa0gabaGaem4AaSgaaaaa@30C4@</m:annotation></m:semantics></m:math></inline-formula> (<it>a</it>) is the motif probability of the nucleotide <it>a </it>situated at position <it>k </it>in the motif, and <it>P</it><sub><it>b </it></sub>(<it>a</it>) is the background probability of the nucleotide <it>a</it>. GeneSplicerESE is freely available under an open source license from <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>MP worked on computational identification of ESEs and designed both the ELPH and GenesplicerESE systems. SMM led the biological analysis and provided the experimental validation data for the predicted ESEs. SLS suggested the study and supervised the entire project. All authors contributed to the writing of the manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We wish to thank Corina M. Antonescu who assisted with preparing the data sets. This work was supported in part by the National Science Foundation under grant MCB-0114792 and by the National Insitutes of Health under grant R01-LM007938.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Modeling dependencies in pre-mRNA splicing signals</p>
            </title>
            <aug>
               <au>
                  <snm>Burge</snm>
                  <fnm>CB</fnm>
               </au>
            </aug>
            <source>Computational Methods in Molecular Biology</source>
            <publisher> ELSEVIER</publisher>
            <editor>Salzberg SL, Searls DB, Kasif S</editor>
            <pubdate>1998</pubdate>
            <volume>32</volume>
            <fpage>129</fpage>
            <lpage>164</lpage>
         </bibl>
         <bibl id="B2">
            <title>
               <p>A computational analysis of sequence features involved in recognition of short introns</p>
            </title>
            <aug>
               <au>
                  <snm>Lim</snm>
                  <fnm>LP</fnm>
               </au>
               <au>
                  <snm>Burge</snm>
                  <fnm>CB</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <issue>20</issue>
            <fpage>11193</fpage>
            <lpage>11198</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">58706</pubid>
                  <pubid idtype="pmpid" link="fulltext">11572975</pubid>
                  <pubid idtype="doi">10.1073/pnas.201407298</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Exonic splicing enhancers: mechanism of action, diversity and role in human genetic diseases</p>
            </title>
            <aug>
               <au>
                  <snm>Blencowe</snm>
                  <fnm>BJ</fnm>
               </au>
            </aug>
            <source>Trends Biochem Sci</source>
            <pubdate>2000</pubdate>
            <volume>25</volume>
            <issue>3</issue>
            <fpage>106</fpage>
            <lpage>110</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0968-0004(00)01549-8</pubid>
                  <pubid idtype="pmpid" link="fulltext">10694877</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Listening to silence and understanding nonsense: exonic mutations that affect splicing</p>
            </title>
            <aug>
               <au>
                  <snm>Cartegni</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Chew</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Krainer</snm>
                  <fnm>AR</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <issue>4</issue>
            <fpage>285</fpage>
            <lpage>298</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg775</pubid>
                  <pubid idtype="pmpid" link="fulltext">11967553</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Determinants of SR protein specificity</p>
            </title>
            <aug>
               <au>
                  <snm>Tacke</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Manley</snm>
                  <fnm>JL</fnm>
               </au>
            </aug>
            <source>Curr Op Cell Biol</source>
            <pubdate>1999</pubdate>
            <volume>11</volume>
            <fpage>358</fpage>
            <lpage>362</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0955-0674(99)80050-7</pubid>
                  <pubid idtype="pmpid" link="fulltext">10395560</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Regulation of alternative RNA splicing by exon definition and exon sequences in viral and mammalian gene expression</p>
            </title>
            <aug>
               <au>
                  <snm>Zheng</snm>
                  <fnm>ZM</fnm>
               </au>
            </aug>
            <source>J Biomed Sci</source>
            <pubdate>2004</pubdate>
            <volume>11</volume>
            <issue>3</issue>
            <fpage>278</fpage>
            <lpage>294</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF02254432</pubid>
                  <pubid idtype="pmpid" link="fulltext">15067211</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Multiple distinct splicing enhancers in the protein-coding sequences of a constitutively spliced pre-mRNA</p>
            </title>
            <aug>
               <au>
                  <snm>Schaal</snm>
                  <fnm>TD</fnm>
               </au>
               <au>
                  <snm>Maniatis</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Mol Cell Biol</source>
            <pubdate>1999</pubdate>
            <volume>19</volume>
            <issue>1</issue>
            <fpage>261</fpage>
            <lpage>273</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">83884</pubid>
                  <pubid idtype="pmpid" link="fulltext">9858550</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Distribution of SR protein exonic splicing enhancer motifs in human protein-coding genes</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Krainer</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>MQ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <issue>16</issue>
            <fpage>5053</fpage>
            <lpage>5062</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1201331</pubid>
                  <pubid idtype="pmpid" link="fulltext">16147989</pubid>
                  <pubid idtype="doi">10.1093/nar/gki810</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Predictive identification of exonic splicing enhancers in human genes</p>
            </title>
            <aug>
               <au>
                  <snm>Fairbrother</snm>
                  <fnm>WG</fnm>
               </au>
               <au>
                  <snm>Yeh</snm>
                  <fnm>RF</fnm>
               </au>
               <au>
                  <snm>Sharp</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Burge</snm>
                  <fnm>CB</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2002</pubdate>
            <volume>297</volume>
            <issue>5583</issue>
            <fpage>1007</fpage>
            <lpage>1013</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1073774</pubid>
                  <pubid idtype="pmpid" link="fulltext">12114529</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Computational definition of sequence motifs governing constitutive exon splicing</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>XH</fnm>
               </au>
               <au>
                  <snm>Chasin</snm>
                  <fnm>LA</fnm>
               </au>
            </aug>
            <source>Genes Dev</source>
            <pubdate>2004</pubdate>
            <volume>18</volume>
            <issue>11</issue>
            <fpage>1241</fpage>
            <lpage>1250</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">420350</pubid>
                  <pubid idtype="pmpid" link="fulltext">15145827</pubid>
                  <pubid idtype="doi">10.1101/gad.1195304</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Broad specificity of SR (serine/arginine) proteins in the regulation of alternative splicing of pre-messenger RNA.</p>
            </title>
            <aug>
               <au>
                  <snm>Bourgeois</snm>
                  <fnm>CF</fnm>
               </au>
               <au>
                  <snm>Lejeune</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Stevenin</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Prog Nucleic Acid Res Mol Biol</source>
            <pubdate>2004</pubdate>
            <volume>78</volume>
            <fpage>37</fpage>
            <lpage>88</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15210328</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Identification of functional exonic splicing enhancer motifs recognized by individual SR proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>HX</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Krainer</snm>
                  <fnm>AR</fnm>
               </au>
            </aug>
            <source>Genes Dev</source>
            <pubdate>1998</pubdate>
            <volume>12</volume>
            <fpage>1998</fpage>
            <lpage>2012</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">316967</pubid>
                  <pubid idtype="pmpid" link="fulltext">9649504</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Comparative analysis identifies exonic splicing regulatory sequences--The complex definition of enhancers and silencers.</p>
            </title>
            <aug>
               <au>
                  <snm>Goren</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ram</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Amit</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Keren</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Lev-Maor</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Vig</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Pupko</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Ast</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Mol Cell</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <issue>6</issue>
            <fpage>769</fpage>
            <lpage>781</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.molcel.2006.05.008</pubid>
                  <pubid idtype="pmpid" link="fulltext">16793546</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Splice site selection in plant pre-mRNA splicing</p>
            </title>
            <aug>
               <au>
                  <snm>Brown</snm>
                  <fnm>JWS</fnm>
               </au>
               <au>
                  <snm>Simpson</snm>
                  <fnm>CG</fnm>
               </au>
            </aug>
            <source>Annu Rev Plant Physiol Plant Mol Biol</source>
            <pubdate>1998</pubdate>
            <volume>49</volume>
            <fpage>77</fpage>
            <lpage>95</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.arplant.49.1.77</pubid>
                  <pubid idtype="pmpid" link="fulltext">15012228</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Splicing of precursors to mRNA in higher plants: mechanism, regulation and sub-nuclear organisation of the spliceosomal machinery</p>
            </title>
            <aug>
               <au>
                  <snm>Simpson</snm>
                  <fnm>GG</fnm>
               </au>
               <au>
                  <snm>Filipowicz</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Plant Mol Biol</source>
            <pubdate>1996</pubdate>
            <volume>32</volume>
            <fpage>1</fpage>
            <lpage>41</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF00039375</pubid>
                  <pubid idtype="pmpid">8980472</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Exon recognition in vertebrate splicing</p>
            </title>
            <aug>
               <au>
                  <snm>Berget</snm>
                  <fnm>SM</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>1995</pubdate>
            <volume>270</volume>
            <issue>6</issue>
            <fpage>2411</fpage>
            <lpage>2414</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">7852296</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Structural analysis of elements contributing to 5 splice site selection in plant pre-mRNA transcripts</p>
            </title>
            <aug>
               <au>
                  <snm>Egoavil</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Marton</snm>
                  <fnm>HA</fnm>
               </au>
               <au>
                  <snm>Baynton</snm>
                  <fnm>CE</fnm>
               </au>
               <au>
                  <snm>McCullough</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Schuler</snm>
                  <fnm>MA</fnm>
               </au>
            </aug>
            <source>Plant J</source>
            <pubdate>1997</pubdate>
            <volume>12</volume>
            <fpage>971</fpage>
            <lpage>980</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1046/j.1365-313X.1997.12050971.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">9418039</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Determinants of plant U12-dependent intron splicing efficiency.</p>
            </title>
            <aug>
               <au>
                  <snm>Lewandowska</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Simpson</snm>
                  <fnm>CG</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>GP</fnm>
               </au>
               <au>
                  <snm>Jennings</snm>
                  <fnm>NS</fnm>
               </au>
               <au>
                  <snm>Barciszewska-Pacak</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>CF</fnm>
               </au>
               <au>
                  <snm>Makalowski</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Jarmolowski</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Plant Cell</source>
            <pubdate>2004</pubdate>
            <volume>16</volume>
            <issue>5</issue>
            <fpage>1340</fpage>
            <lpage>1352</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">423220</pubid>
                  <pubid idtype="pmpid" link="fulltext">15100401</pubid>
                  <pubid idtype="doi">10.1105/tpc.020743</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Interactions between introns via exon definition in plant pre-mRNA splicing.</p>
            </title>
            <aug>
               <au>
                  <snm>Simpson</snm>
                  <fnm>CG</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>GP</fnm>
               </au>
               <au>
                  <snm>Lyon</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Watters</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>McQuade</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>JWS</fnm>
               </au>
            </aug>
            <source>Plant J</source>
            <pubdate>1999</pubdate>
            <volume>18</volume>
            <fpage>293</fpage>
            <lpage>302</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1046/j.1365-313X.1999.00463.x</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>A plethora of plant serine/arginine-rich proteins: redundancy or evolution of novel gene functions?</p>
            </title>
            <aug>
               <au>
                  <snm>Kalyna</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Barta</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Biochem Soc Trans</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <issue>Pt 4</issue>
            <fpage>561</fpage>
            <lpage>564</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15270675</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Plant serine/arginine-rich proteins and their role in pre-mRNA splicing.</p>
            </title>
            <aug>
               <au>
                  <snm>Reddy</snm>
                  <fnm>AS</fnm>
               </au>
            </aug>
            <source>Trends Plant Sci</source>
            <pubdate>2004</pubdate>
            <volume>9</volume>
            <issue>11</issue>
            <fpage>541</fpage>
            <lpage>547</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tplants.2004.09.007</pubid>
                  <pubid idtype="pmpid" link="fulltext">15501179</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Pre-mRNA splicing in plants: characterization of Ser/Arg splicing factors</p>
            </title>
            <aug>
               <au>
                  <snm>Lopato</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Mayeda</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Krainer</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Barta</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1996</pubdate>
            <volume>913</volume>
            <fpage>3074</fpage>
            <lpage>3079</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1073/pnas.93.7.3074</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Characterization of a novel arginine/serine-rich splicing factor in Arabidopsis</p>
            </title>
            <aug>
               <au>
                  <snm>Lopato</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Waigmann</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Barta</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Plant Cell</source>
            <pubdate>1996</pubdate>
            <volume>8</volume>
            <fpage>2255</fpage>
            <lpage>2264</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">161350</pubid>
                  <pubid idtype="pmpid" link="fulltext">8989882</pubid>
                  <pubid idtype="doi">10.1105/tpc.8.12.2255</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>A novel family of plant splicing factors with a Zn knuckle motif: examination of RNA binding and splicing activities</p>
            </title>
            <aug>
               <au>
                  <snm>Lopato</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gattoni</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Fabini</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Stevenin</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Barta</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Plant Mol Biol</source>
            <pubdate>1999</pubdate>
            <volume>39</volume>
            <fpage>761</fpage>
            <lpage>773</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1023/A:1006129615846</pubid>
                  <pubid idtype="pmpid" link="fulltext">10350090</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Identification of a plant serine-arginine-rich protein similar to the mammalian splicing factor SF2/ASF</p>
            </title>
            <aug>
               <au>
                  <snm>Lazar</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Schaal</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Maniatis</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Goodman</snm>
                  <fnm>HM</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1995</pubdate>
            <volume>92</volume>
            <fpage>7672</fpage>
            <lpage>7676</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">41207</pubid>
                  <pubid idtype="pmpid" link="fulltext">7644475</pubid>
                  <pubid idtype="doi">10.1073/pnas.92.17.7672</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>The Arabidopsis splicing factor SR1 is regulated by alternative splicing</p>
            </title>
            <aug>
               <au>
                  <snm>Lazar</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Goodman</snm>
                  <fnm>HM</fnm>
               </au>
            </aug>
            <source>Plant Mol Biol</source>
            <pubdate>2000</pubdate>
            <volume>42</volume>
            <fpage>571</fpage>
            <lpage>581</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1023/A:1006394207479</pubid>
                  <pubid idtype="pmpid" link="fulltext">10809003</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>atSRp30, one of two SF2/ASF-like proteins from Arabidopsis thaliana, regulates splicing of specific plant genes</p>
            </title>
            <aug>
               <au>
                  <snm>Lopato</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kalyna</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Dorner</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kobayashi</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Krainer</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Barta</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genes Dev</source>
            <pubdate>1999</pubdate>
            <volume>13</volume>
            <fpage>987</fpage>
            <lpage>1001</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">316644</pubid>
                  <pubid idtype="pmpid" link="fulltext">10215626</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>GeneSplicer: a new computational method for splice site prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Pertea</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <issue>5</issue>
            <fpage>1185</fpage>
            <lpage>1190</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">29713</pubid>
                  <pubid idtype="pmpid" link="fulltext">11222768</pubid>
                  <pubid idtype="doi">10.1093/nar/29.5.1185</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>SpliceMachine: predicting splice sites from high-dimensional local context representations</p>
            </title>
            <aug>
               <au>
                  <snm>Degroeve</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Saeys</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>De Baets</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Rouze</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Van de Peer</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>8</issue>
            <fpage>1332</fpage>
            <lpage>1338</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti166</pubid>
                  <pubid idtype="pmpid" link="fulltext">15564294</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Full-length messenger RNA sequences greatly improve genome annotation</p>
            </title>
            <aug>
               <au>
                  <snm>Haas</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Volfovsky</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Town</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Troukhan</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Alexandrov</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Feldmann</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Flavell</snm>
                  <fnm>RB</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Genome Biology</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <issue>6</issue>
            <fpage>RESEARCH0029</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">116726</pubid>
                  <pubid idtype="pmpid" link="fulltext">12093376</pubid>
                  <pubid idtype="doi">10.1186/gb-2002-3-6-research0029</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Ceres cDNA data</p>
            </title>
            <url>ftp://ftp.tigr.org/pub/data/a_thaliana/ceres/</url>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Gapped blast and psi-blast: a new generation of protein database search programs</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Schaffer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <issue>17</issue>
            <fpage>3389</fpage>
            <lpage>3402</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146917</pubid>
                  <pubid idtype="pmpid" link="fulltext">9254694</pubid>
                  <pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Arabidopsis thaliana data</p>
            </title>
            <url>ftp://ftp.tigr.org/pub/data/a_thaliana</url>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Pre-mRNA Splicing Signals in Arabidopsis - ESE data</p>
            </title>
            <url>http://www.life.umd.edu/labs/mount/2010-splicing/ESEs.html</url>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Identification of positive and negative splicing regulatory elements within the terminal tat-rev exon of human immunodeficiency virus type 1</p>
            </title>
            <aug>
               <au>
                  <snm>Staffa</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Cochrane</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Mol Cell Biol</source>
            <pubdate>1995</pubdate>
            <volume>15</volume>
            <issue>8</issue>
            <fpage>4597</fpage>
            <lpage>4605</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">230700</pubid>
                  <pubid idtype="pmpid" link="fulltext">7623851</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Systematic identification and analysis of exonic splicing silencers</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Rolish</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Yeo</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Tung</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Mawson</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Burge</snm>
                  <fnm>CB</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2004</pubdate>
            <volume>119</volume>
            <issue>6</issue>
            <fpage>831</fpage>
            <lpage>845</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cell.2004.11.010</pubid>
                  <pubid idtype="pmpid" link="fulltext">15607979</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>SpliceMachine</p>
            </title>
            <url>http://bioinformatics.psb.ugent.be/webtools/splicemachine/</url>
         </bibl>
         <bibl id="B38">
            <title>
               <p>A systematic analysis of the factors that determine the strength of pre-mRNA splicing enhancers</p>
            </title>
            <aug>
               <au>
                  <snm>Graveley</snm>
                  <fnm>BR</fnm>
               </au>
               <au>
                  <snm>Hertel</snm>
                  <fnm>KJ</fnm>
               </au>
               <au>
                  <snm>Maniatis</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>EMBO J</source>
            <pubdate>1998</pubdate>
            <volume>17</volume>
            <issue>22</issue>
            <fpage>6747</fpage>
            <lpage>6756</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1171020</pubid>
                  <pubid idtype="pmpid" link="fulltext">9822617</pubid>
                  <pubid idtype="doi">10.1093/emboj/17.22.6747</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length</p>
            </title>
            <aug>
               <au>
                  <snm>Favorov</snm>
                  <fnm>AV</fnm>
               </au>
               <au>
                  <snm>Gelfand</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Gerasimova</snm>
                  <fnm>AV</fnm>
               </au>
               <au>
                  <snm>Ravcheev</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Mironov</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Makeev</snm>
                  <fnm>VJ</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>10</issue>
            <fpage>2240</fpage>
            <lpage>2245</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti336</pubid>
                  <pubid idtype="pmpid" link="fulltext">15728117</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>A suite of web-based programs to search for transcriptional regulatory motifs</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Wei</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Batzoglou</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Brutlag</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>XS</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>W204</fpage>
            <lpage>7</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">441599</pubid>
                  <pubid idtype="pmpid" link="fulltext">15215381</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh461</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>A Gibbs sampling method to detect overrepresented motifs in the upstream regions of coexpressed genes</p>
            </title>
            <aug>
               <au>
                  <snm>Thijs</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Marchal</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Lescot</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rombauts</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>De Moor</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Rouze</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Moreau</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>2002</pubdate>
            <volume>9</volume>
            <issue>2</issue>
            <fpage>447</fpage>
            <lpage>464</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/10665270252935566</pubid>
                  <pubid idtype="pmpid" link="fulltext">12015892</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Gibbs Recursive Sampler: finding transcription factor binding sites</p>
            </title>
            <aug>
               <au>
                  <snm>Thompson</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Rouchka</snm>
                  <fnm>EC</fnm>
               </au>
               <au>
                  <snm>Lawrence</snm>
                  <fnm>CE</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <issue>13</issue>
            <fpage>3580</fpage>
            <lpage>3585</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">169014</pubid>
                  <pubid idtype="pmpid" link="fulltext">12824370</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg608</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>The ELPH Home Page</p>
            </title>
            <url>http://www.cbcb.umd.edu/software/elph</url>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Gibbs motif sampling: detection of bacterial outer membrane protein repeats</p>
            </title>
            <aug>
               <au>
                  <snm>Neuwald</snm>
                  <fnm>AF</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Lawrence</snm>
                  <fnm>CE</fnm>
               </au>
            </aug>
            <source>Protein Sci</source>
            <pubdate>1995</pubdate>
            <volume>4</volume>
            <issue>8</issue>
            <fpage>1618</fpage>
            <lpage>1632</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8520488</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>WebLogo: A sequence logo generator</p>
            </title>
            <aug>
               <au>
                  <snm>Crooks</snm>
                  <fnm>GE</fnm>
               </au>
               <au>
                  <snm>Hon</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Chandonia</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Brenner</snm>
                  <fnm>SE</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>1188</fpage>
            <lpage>1190</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">419797</pubid>
                  <pubid idtype="pmpid" link="fulltext">15173120</pubid>
                  <pubid idtype="doi">10.1101/gr.849004</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>The Nature of Statistical Learning Theory</p>
            </title>
            <aug>
               <au>
                  <snm>Vapnik</snm>
                  <fnm>VN</fnm>
               </au>
            </aug>
            <publisher>New York , Springer</publisher>
            <edition>2nd</edition>
            <pubdate>2000</pubdate>
            <volume>second edition</volume>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Accurate identification of alternatively spliced exons using support vector machine</p>
            </title>
            <aug>
               <au>
                  <snm>Dror</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Sorek</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Shamir</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>7</issue>
            <fpage>897</fpage>
            <lpage>901</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti132</pubid>
                  <pubid idtype="pmpid" link="fulltext">15531599</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>A Tutorial on Support Vector Machines for Pattern Recognition</p>
            </title>
            <aug>
               <au>
                  <snm>Burges</snm>
        