<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2008-9-2-r29</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Finding exonic islands in a sea of non-coding sequence: splicing related constraints on protein composition and evolution are common in intron-rich genomes</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Warnecke</snm>
               <fnm>Tobias</fnm>
               <insr iid="I1"/>
               <email>tw233@bath.ac.uk</email>
            </au>
            <au id="A2">
               <snm>Parmley</snm>
               <mi>L</mi>
               <fnm>Joanna</fnm>
               <insr iid="I1"/>
               <email>j.parmley@cmbi.ru.nl</email>
            </au>
            <au id="A3" ca="yes">
               <snm>Hurst</snm>
               <mi>D</mi>
               <fnm>Laurence</fnm>
               <insr iid="I1"/>
               <email>l.d.hurst@bath.ac.uk</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, BA2 7AY, UK</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2008</pubdate>
         <volume>9</volume>
         <issue>2</issue>
         <fpage>R29</fpage>
         <url>http://genomebiology.com/2008/9/2/R29</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18257921</pubid>
               <pubid idtype="doi">10.1186/gb-2008-9-2-r29</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>5</day>
               <month>9</month>
               <year>2007</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>23</day>
               <month>11</month>
               <year>2007</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>7</day>
               <month>2</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>07</day>
               <month>02</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Warnecke et al.; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p>Splicing constraints on sequence</p>
      </shorttitle>
      <shortabs>
         <p>Biased usage of amino acids near exon-intron boundaries is phylogenetically widespread and characteristic of species for which there are expected to be problems defining exons.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>In mammals, splice-regulatory domains impose marked trends on the relative abundance of certain amino acids near exon-intron boundaries. Is this a mammalian particularity or symptomatic of exonic splicing regulation across taxa? Are such trends more common in species that <it>a priori </it>have a harder time identifying exon ends, that is, those with pre-mRNA rich in intronic sequence? We address these questions surveying exon composition in a sample of phylogenetically diverse genomes.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Biased amino acid usage near exon-intron boundaries is common throughout the metazoa but not restricted to the metazoa. There is extensive cross-species concordance as to which amino acids are affected, and reduced/elevated abundances are well predicted by knowledge of splice enhancers. Species expected to rely on exon definition for splicing, that is, those with a higher ratio of intronic to coding sequence, more introns per gene and longer introns, exhibit more amino acid skews. Notably, this includes the intron-rich basidiomycete <it>Cryptococcus neoformans</it>, which, unlike intron-poor ascomycetes (<it>Schizosaccharomyces pombe</it>, <it>Saccharomyces cerevisiae</it>), exhibits compositional biases reminiscent of the metazoa. Strikingly, 5 prime ends of nematode exons deviate radically from normality: amino acids strongly preferred near boundaries are strongly avoided in other species, and vice versa. This we suggest is a measure to avoid attracting <it>trans</it>-splicing machinery.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Constraints on amino acid composition near exon-intron boundaries are phylogenetically widespread and characteristic of species where exon localization should be problematic. That compositional biases accord with sequence preferences of splice-regulatory proteins and are absent in ascomycetes is consistent with selection on exonic splicing regulation.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010016">Molecular biology</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>The maxim that 'form follows function', dogmatically adhered to in some early 20th century design and architecture, refers to the idea that the final function of a product should be the only determinant of its design. Phenotypic products of evolutionary processes have also frequently been analyzed in this seductively simple framework.</p>
         <p>However, costs of production, the availability of raw materials, and other factors regularly lead to marketable goods being suboptimally designed as far as their immediate function is concerned. Likewise, in, for example, mammals, amino acid content of a protein reflects localized GC content <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>.</p>
         <p>The need to encode, in exonic sequence, information relevant for correct splicing is another factor with the potential to influence protein composition <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. Located in the exonic parts of primary mRNA transcripts, exonic splicing enhancers (ESEs) are short (6-8 nucleotides) nucleotide motifs that have been established as a core component of the pre-mRNA splicing mechanism in metazoans <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. Playing a critical role in constitutive as well as alternative splicing <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>, they function at multiple stages of spliceosome assembly by interacting with corresponding RNA recognition motifs in a number of different <it>trans</it>-factors <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. We will primarily focus on SR (serine-arginine) proteins because their binding specificities and functions in splicing regulation have been most extensively characterized. SR proteins appear critical for establishing, in conjunction with other proteins, cross-exon complexes that enable faithful communication between splice sites <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>.</p>
         <p>Recognition of exonic alongside intronic sequence motifs has been proposed to be pivotal in organisms where a majority of exons are flanked by much larger introns, allowing exons to be efficiently identified and not lost in a sea of intronic sequence <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Furthermore, whereas in <it>Saccharomyces cerevisiae </it>splice sites and branch point sequences show a high degree of conservation to ensure the intron is correctly targeted by the splicing machinery, these recognition motifs tend to be less well conserved in multicellular organisms <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> and intron-rich fungal genomes <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>.</p>
         <p>Experimentally raising the number of natural exonic enhancer sites leads to an additive increase in splicing activity <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. Importantly, ESEs function in a position-dependent manner, their efficiency in catalyzing splicing decreasing with increasing distance from the splice site <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. The significant enrichment near exon-intron boundaries for GAA (a codon known to be overrepresented in ESEs) compared with the synonymous GAG is consistent with this finding <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. More generally, in mammals codons enriched in ESEs are more common near intron-exon boundaries <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>.</p>
         <p>A recent study by Parmley <it>et al</it>. <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> suggests ESEs have also left an imprint on the amino acid composition of proteins. Exploring exonic sequences adjacent to exon-intron boundaries in human and mouse, the authors reported marked trends in the relative abundance of certain amino acids when one moves away from the boundary. Some amino acids, such as lysine (K) and isoleucine (I), are strongly preferred near boundaries whereas others, such as proline (P) and alanine (A), are significantly avoided (for a full list see Tables <tblr tid="T1">1</tblr> and <tblr tid="T2">2</tblr>). This is the case for both 5' and 3' ends of exons. Considering separately the two-fold and four-fold blocks of the six-fold degenerate amino acids, the authors also showed that these trends are owing to avoidances/preferences at the nucleotide level and that there is a high degree of correspondence between the codons preferred and their involvement in computationally predicted and experimentally verified ESEs.</p>
         <tbl id="T1">
            <title>
               <p>Table 1</p>
            </title>
            <caption>
               <p>Amino acids significantly preferred (-) or avoided (+) at 3' ends of exons across species</p>
            </caption>
            <tblbdy cols="24">
               <r>
                  <c cspan="23" ca="left">
                     <p>Amino acids<sup>*&#8224;</sup></p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c cspan="23">
                     <hr/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>A</p>
                  </c>
                  <c ca="left">
                     <p>C</p>
                  </c>
                  <c ca="left">
                     <p>D</p>
                  </c>
                  <c ca="left">
                     <p>E</p>
                  </c>
                  <c ca="left">
                     <p>F</p>
                  </c>
                  <c ca="left">
                     <p>G</p>
                  </c>
                  <c ca="left">
                     <p>H</p>
                  </c>
                  <c ca="left">
                     <p>I</p>
                  </c>
                  <c ca="left">
                     <p>K</p>
                  </c>
                  <c ca="left">
                     <p>L4</p>
                  </c>
                  <c ca="left">
                     <p>L2</p>
                  </c>
                  <c ca="left">
                     <p>M</p>
                  </c>
                  <c ca="left">
                     <p>N</p>
                  </c>
                  <c ca="left">
                     <p>P</p>
                  </c>
                  <c ca="left">
                     <p>Q</p>
                  </c>
                  <c ca="left">
                     <p>R4</p>
                  </c>
                  <c ca="left">
                     <p>R2</p>
                  </c>
                  <c ca="left">
                     <p>S4</p>
                  </c>
                  <c ca="left">
                     <p>S2</p>
                  </c>
                  <c ca="left">
                     <p>T</p>
                  </c>
                  <c ca="left">
                     <p>V</p>
                  </c>
                  <c ca="left">
                     <p>W</p>
                  </c>
                  <c ca="left">
                     <p>Y</p>
                  </c>
                  <c ca="left">
                     <p>Species (number of exons)<sup>&#8225;</sup></p>
                  </c>
               </r>
               <r>
                  <c cspan="24">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>+<sub>3</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>7</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>3</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>2</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>1</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>5</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>6</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>2</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>1</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>4</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>4</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Human (178,438)</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>+<sub>3</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>6</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>3</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>2</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>1</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>5</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>4</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>1</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>2</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>7</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>5</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>4</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Mouse (126,268)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>4</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>5</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>3</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>1</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>2</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>6</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>2</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>1</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>3</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>4</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p><it>D. rerio</it> (41,264)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>4</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>1</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>3</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>6</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>2</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>5</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>3</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>1</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>4</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>2</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>5</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p><it>C. elegans </it>(79,958)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>6</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>3</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>2</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>4</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>8</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>3</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>5</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>5</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>1</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>6</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>2</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>7</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>1</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>4</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p><it>C. briggsae </it>(74,178)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>1</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>3</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>2</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>2</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>1</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>4</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p><it>A. gambiae </it>(7,930)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>2</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>1</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>1</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>3</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>2</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p><it>D. melanogaster </it>(48,933)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>2</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>1</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>5</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>1</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>4</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>5</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>3</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>2</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>6</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>4</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>3</sub></p>
                  </c>
                  <c ca="left">
                     <p><it>A. mellifera </it>(45,426)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>2</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>2</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>1</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>3</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>3</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>1</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p><it>A. thaliana </it>(109,900)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p><it>S. pombe </it>(2,403)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>1</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p><it>S. cerevisiae </it>(417)</p>
                  </c>
               </r>
            </tblbdy>
            <tblfn>
               <p>*Indices signify rank order of slope coefficients, separately for negative and positive trends. <sup>&#8224;</sup>L2, R2, S2 and L4, R4, S4 signify the two-fold and four-fold degenerate blocks of leucine, arginine, and serine, respectively. <sup>&#8225;</sup><it>S. cerevisiae </it>terminal exons were retained given the small number of genes with more than one intron (eight).</p>
            </tblfn>
         </tbl>
         <tbl id="T2">
            <title>
               <p>Table 2</p>
            </title>
            <caption>
               <p>Amino acids significantly preferred (-) or avoided (+) at 5' ends of exons across species</p>
            </caption>
            <tblbdy cols="24">
               <r>
                  <c cspan="23" ca="left">
                     <p>Amino acids<sup>*&#8224;</sup></p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c cspan="23">
                     <hr/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>A</p>
                  </c>
                  <c ca="left">
                     <p>C</p>
                  </c>
                  <c ca="left">
                     <p>D</p>
                  </c>
                  <c ca="left">
                     <p>E</p>
                  </c>
                  <c ca="left">
                     <p>F</p>
                  </c>
                  <c ca="left">
                     <p>G</p>
                  </c>
                  <c ca="left">
                     <p>H</p>
                  </c>
                  <c ca="left">
                     <p>I</p>
                  </c>
                  <c ca="left">
                     <p>K</p>
                  </c>
                  <c ca="left">
                     <p>L4</p>
                  </c>
                  <c ca="left">
                     <p>L2</p>
                  </c>
                  <c ca="left">
                     <p>M</p>
                  </c>
                  <c ca="left">
                     <p>N</p>
                  </c>
                  <c ca="left">
                     <p>P</p>
                  </c>
                  <c ca="left">
                     <p>Q</p>
                  </c>
                  <c ca="left">
                     <p>R4</p>
                  </c>
                  <c ca="left">
                     <p>R2</p>
                  </c>
                  <c ca="left">
                     <p>S4</p>
                  </c>
                  <c ca="left">
                     <p>S2</p>
                  </c>
                  <c ca="left">
                     <p>T</p>
                  </c>
                  <c ca="left">
                     <p>V</p>
                  </c>
                  <c ca="left">
                     <p>W</p>
                  </c>
                  <c ca="left">
                     <p>Y</p>
                  </c>
                  <c ca="left">
                     <p>Species (number of exons)<sup>&#8225;</sup></p>
                  </c>
               </r>
               <r>
                  <c cspan="24">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>+<sub>2</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>4</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>5</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>7</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>3</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>1</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>2</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>8</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>6</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>1</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>4</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>3</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>7</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>5</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>6</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Human (178,438)</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>+<sub>2</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>4</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>5</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>7</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>3</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>1</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>2</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>7</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>6</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>1</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>4</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>3</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>5</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>6</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Mouse (126,268)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>2</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>1</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>2</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>3</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>1</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>5</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>4</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>3</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p><it>D. rerio</it> (41,264)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>3</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>2</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>4</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>1</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>5</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>1</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>3</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>2</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>4</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>5</sub></p>
                  </c>
                  <c ca="left">
                     <p><it>C. elegans </it>(79,958)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>5</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>4</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>3</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>2</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>2</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>5</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>1</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>1</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>3</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>4</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>6</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p><it>C. briggsae </it>(74,178)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>1</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p><it>A. gambiae </it>(7,930)</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>+<sub>1</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>3</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>1</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>3</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>2</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>2</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>4</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p><it>D. melanogaster </it>(48,933)</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>+<sub>1</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>3</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>2</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>4</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>1</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>4</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>3</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>2</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>-<sub>5</sub></p>
                  </c>
                  <c ca="left">
                     <p>-<sub>6</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p><it>A. mellifera </it>(45,426)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>+<sub>1</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>3</sub></p>
                  </c>
                  <c ca="left">
                     <p>+<sub>2</sub></p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p><it>A. thaliana </it>(109,900)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p><it>S. pombe </it>(2,403)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p><it>S. cerevisiae </it>(417)</p>
                  </c>
               </r>
            </tblbdy>
            <tblfn>
               <p>*Indices signify rank order of slope coefficients, separately for negative and positive trends. <sup>&#8224;</sup>L2, R2, S2 and L4, R4, S4 signify the two-fold and four-fold degenerate blocks of leucine, arginine, and serine, respectively. <sup>&#8225;</sup><it>S. cerevisiae </it>terminal exons were retained given the small number of genes with more than one intron (eight).</p>
            </tblfn>
         </tbl>
         <p>But are these trends a peculiarity of mammals or common in other taxa? Does the presence or absence of trends correspond to what is known about the significance of exonic splicing regulation in each species? For example, a recent survey of several eukaryote genomes showed the SR protein family to be greatly expanded in metazoans but scarcely represented in unicellular genomes <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. A failure to find preference trends in <it>S. cerevisiae</it>, an organism lacking SR proteins <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, might corroborate the hypothesis that preference patterns are indeed caused by ESEs. Moreover, if there are discernible trends in other species, do we repeatedly see the same amino acids avoided or preferred or are trends largely unique to each species? Also, are mammals unusual in showing a tight correlation between 5' and 3' trends, and may divergent results bear implications for the workings of the splicing machinery? Finally do we find more skews in species that <it>a priori </it>are expected to have a harder time identifying exons, that is, those in which exons are relatively small islands in a sea of intronic sequence? Here we examine these issues with exon data from a diverse set of species.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Preference trends are widespread in multicellular species</p>
            </st>
            <p>Exons from eight metazoan species (Human (Hs), mouse (Mm), <it>Danio rerio </it>(Dr), <it>Caenorrhabditis elegans </it>(Ce), <it>Caenorrhabditis briggsae </it>(Cb), <it>Anopheles gambiae </it>(Ag), <it>Drosophila melanogaster </it>(Dm), <it>Apis mellifera </it>(Am)), one plant (<it>Arabidopsis thaliana</it> (At)) and two ascomycetous fungi (<it>S. cerevisiae </it>(Sc), <it>Schizosaccharomyces pombe </it>(Sp)), were examined for trends in amino acid composition as one approaches the exon-intron boundary. Species were chosen from among a relatively small set of organisms for which high quality comparative data on splice-regulatory proteins have recently become available <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. As splice site signals can extend into exons and our focus is on exonic splicing regulation, we removed the first full codon at the exon-intron boundary (see Materials and methods). Thereafter, rank correlations (rho) between distance from the boundary (34 codons into the exon; see Materials and methods) and proportional usage of the amino acid were computed independently for 5' and 3' regions of exons. Further, for all amino acids independently we fitted a linear regression extracting the slope of the line to be used as a crude diagnostic for the strength of amino acid preference/avoidance. Figure <figr fid="F1">1</figr> illustrates the different types of relationship observed.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Nature and diversity of amino acid abundance trends near exon-intron boundaries</p>
               </caption>
               <text>
                  <p>Nature and diversity of amino acid abundance trends near exon-intron boundaries. Relative abundance of glutamine (Q), methionine (M), and lysine (K) as a function of distance from the boundary across 5' ends of <it>D. melanogaster </it>exons is shown. Glutamine is significantly avoided near the boundary (rho = 0.86, <it>P</it> &lt; 1.84E-7), lysine is preferred (rho = -0.65, <it>P </it>&lt; 6.2E-5), whilst no significant trend is evident for methionine (rho = 0.096, <it>P</it> = 0.59). Note that a negative slope/rho value indicates a preference near the exon-intron boundary. Typically, where patterns of preference/avoidance are evident, we observe quasi-monotonic decreases/increases in relative abundance across the sequence range analyzed.</p>
               </text>
               <graphic file="gb-2008-9-2-r29-1"/>
            </fig>
            <p>Two-fold and four-fold blocks of the six-fold degenerate amino acids were considered as distinct groupings so that a total of 46 tests (23 amino acid groups 5' and 3') were carried out for each species. Tables <tblr tid="T1">1</tblr> and <tblr tid="T2">2</tblr> give a comprehensive by-species overview of amino acid preferences/avoidances, significant after Bonferroni correction (N = 46 comparisons, <it>P </it>&lt; 0.0011). Additional data file 1 contains the complete set of rank correlations for all 11 species.</p>
            <p>The most conspicuous feature of Tables <tblr tid="T1">1</tblr> and <tblr tid="T2">2</tblr> is arguably the commonality of trends in the metazoa and the scarcity of trends in the ascomycetous yeast species. The two-fold block of leucine (L2) in <it>S. cerevisiae </it>is the only amino acid grouping exhibiting a significant preference trend (rho = -0.4482, <it>P </it>&lt; 0.0003). This is in stark contrast to the suite of multicellular eukaryotes where an extensive range of avoidance and preference trends is observed. Only three multicellular species display fewer than 13 significant trends (Dm, Ag, At) whereas five (Hs, Mm, Ce, Cb, Am) display more than 20. For <it>D. melanogaster </it>and <it>C. elegans</it>, we tested whether the results might be biased as a result of exon homology, but in either case found amino acid abundance patterns at exon ends to be virtually identical in a set of homology-reduced genes (Dm, N = 8,840; Ce, N = 11,790; Additional data files 2 and 3).</p>
            <p>The role of exonic guidance in splicing organization has been linked to multiple aspects of genome composition and pre-mRNA structure, including intron/exon length <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>, intron number <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> and density <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> and splice site information content <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>. The number of significant amino acid trends per species tightly covaries with some of these factors, notably the mean number of introns per gene (rho = 0.95, <it>P </it>&lt; 0.0001), median coding sequence (median CDS) per gene (rho = -0.97, <it>P </it>&lt; 0.0007), genomic number of introns (rho = 0.86, <it>P </it>&lt; 0.003), and intron length (log10(mean length): rho = 0.83, <it>P </it>&lt; 0.006) as expected under a model where complex transcripts with multiple long introns elicit increasing reliance on exon definition <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. On the other hand, neither SR protein family size (rho = 0.59, <it>P </it>= 0.09) nor splice site information content (5', rho = -0.26, <it>P </it>= 0.50; 3', rho = 0.43, <it>P </it>= 0. 25) show any relationship with the number of amino acid skews near intron-exon boundaries. The latter observation is perhaps the more interesting as it suggests that there is no straightforward compensatory relationship between splice site information content and the need for exonic regulation across species.</p>
            <p>Finally, the number of exons from which amino acid trends were derived, although correlated with the number of trends (rho = 0.86, <it>P </it>&lt; 0.003), does not feature among the top predictors when multicollinearity is controlled for (Additional data files 4-6). Together with the observation that we find relatively few trends in <it>Arabidopsis</it>, despite the substantial number of exons sampled, this suggests that sample size is not the critical factor in detecting different numbers of trends across species. We must stress, however, that the above results should be regarded as strictly exploratory given the small number of observations (Additional data file 4). A greater number of species with more comprehensive phylogenetic sampling will be required to validate the results in the future.</p>
            <p>The preeminence of exon-intron structure in predicting the number of amino acid trends suggests that the intron-poor ascomycetous fungi analyzed here might not be representative of their kingdom. We therefore analyzed the composition of exon ends in <it>Cryptococcus neoformans </it>(Cn), an intron-rich basidiomycete. Strikingly, we find a large number (26) of preference and avoidance trends in this species (Table <tblr tid="T3">3</tblr> and Additional data file 1), with some marked similarities in comparison to metazoan trends, particularly 5'. Furthermore, the inclusion of <it>C. neoformans </it>data in the analysis of potential predictor variables does not substantially change previous results: the mean number of introns per gene (rho = 0.91, <it>P </it>&lt; 0.0002), median CDS per gene (rho = -0.68, <it>P </it>&lt; 0.032) and the genomic number of introns (rho = 0.72, <it>P </it>&lt; 0.02) remain strong predictors (Additional data file 6).</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Amino acids significantly preferred (-) or avoided (+) at 3' (top rows) and 5' (bottom rows) exon ends of <it>C. neoformans </it>compared to human</p>
               </caption>
               <tblbdy cols="24">
                  <r>
                     <c cspan="23" ca="left">
                        <p>Amino acids<sup>*&#8224;</sup></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c cspan="23">
                        <hr/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>A</p>
                     </c>
                     <c ca="left">
                        <p>C</p>
                     </c>
                     <c ca="left">
                        <p>D</p>
                     </c>
                     <c ca="left">
                        <p>E</p>
                     </c>
                     <c ca="left">
                        <p>F</p>
                     </c>
                     <c ca="left">
                        <p>G</p>
                     </c>
                     <c ca="left">
                        <p>H</p>
                     </c>
                     <c ca="left">
                        <p>I</p>
                     </c>
                     <c ca="left">
                        <p>K</p>
                     </c>
                     <c ca="left">
                        <p>L4</p>
                     </c>
                     <c ca="left">
                        <p>L2</p>
                     </c>
                     <c ca="left">
                        <p>M</p>
                     </c>
                     <c ca="left">
                        <p>N</p>
                     </c>
                     <c ca="left">
                        <p>P</p>
                     </c>
                     <c ca="left">
                        <p>Q</p>
                     </c>
                     <c ca="left">
                        <p>R4</p>
                     </c>
                     <c ca="left">
                        <p>R2</p>
                     </c>
                     <c ca="left">
                        <p>S4</p>
                     </c>
                     <c ca="left">
                        <p>S2</p>
                     </c>
                     <c ca="left">
                        <p>T</p>
                     </c>
                     <c ca="left">
                        <p>V</p>
                     </c>
                     <c ca="left">
                        <p>W</p>
                     </c>
                     <c ca="left">
                        <p>Y</p>
                     </c>
                     <c ca="left">
                        <p>Species (number of exons)</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="24">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>+<sub>3</sub></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>-<sub>7</sub></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>-<sub>3</sub></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>-<sub>2</sub></p>
                     </c>
                     <c ca="left">
                        <p>-<sub>1</sub></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>-<sub>5</sub></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>-<sub>6</sub></p>
                     </c>
                     <c ca="left">
                        <p>+<sub>2</sub></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>+<sub>1</sub></p>
                     </c>
                     <c ca="left">
                        <p>-<sub>4</sub></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>+<sub>4</sub></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Human (178,438): 3'</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>-<sub>6</sub></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>+<sub>1</sub></p>
                     </c>
                     <c ca="left">
                        <p>-<sub>2</sub></p>
                     </c>
                     <c ca="left">
                        <p>+<sub>4</sub></p>
                     </c>
                     <c ca="left">
                        <p>-<sub>7</sub></p>
                     </c>
                     <c ca="left">
                        <p>-<sub>1</sub></p>
                     </c>
                     <c ca="left">
                        <p>+<sub>3</sub></p>
                     </c>
                     <c ca="left">
                        <p>-<sub>3</sub></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>+<sub>6</sub></p>
                     </c>
                     <c ca="left">
                        <p>-<sub>5</sub></p>
                     </c>
                     <c ca="left">
                        <p>+<sub>2</sub></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>+<sub>5</sub></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>-<sub>4</sub></p>
                     </c>
                     <c ca="left">
                        <p><it>C. neoformans </it>(28,446): 3'</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>+<sub>2</sub></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>-<sub>4</sub></p>
                     </c>
                     <c ca="left">
                        <p>-<sub>5</sub></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>+<sub>7</sub></p>
                     </c>
                     <c ca="left">
                        <p>-<sub>3</sub></p>
                     </c>
                     <c ca="left">
                        <p>-<sub>1</sub></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>-<sub>2</sub></p>
                     </c>
                     <c ca="left">
                        <p>-<sub>8</sub></p>
                     </c>
                     <c ca="left">
                        <p>-<sub>6</sub></p>
                     </c>
                     <c ca="left">
                        <p>+<sub>1</sub></p>
                     </c>
                     <c ca="left">
                        <p>+<sub>4</sub></p>
                     </c>
                     <c ca="left">
                        <p>+<sub>3</sub></p>
                     </c>
                     <c ca="left">
                        <p>-<sub>7</sub></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>+<sub>5</sub></p>
                     </c>
                     <c ca="left">
                        <p>+<sub>6</sub></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Human (178438): 5'</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>-<sub>9</sub></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>-<sub>5</sub></p>
                     </c>
                     <c ca="left">
                        <p>-<sub>7</sub></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>-<sub>3</sub></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>-<sub>1</sub></p>
                     </c>
                     <c ca="left">
                        <p>-<sub>6</sub></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>+<sub>2</sub></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>+<sub>3</sub></p>
                     </c>
                     <c ca="left">
                        <p>-<sub>4</sub></p>
                     </c>
                     <c ca="left">
                        <p>+<sub>1</sub></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>-<sub>8</sub></p>
                     </c>
                     <c ca="left">
                        <p>-<sub>10</sub></p>
                     </c>
                     <c ca="left">
                        <p>-<sub>2</sub></p>
                     </c>
                     <c ca="left">
                        <p><it>C. neoformans </it>(28,446): 5'</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>*Indices signify rank order of slope coefficients, separately for negative and positive trends. <sup>&#8224;</sup>L2, R2, S2 and L4, R4, S4 signify the two-fold and four-fold degenerate blocks of leucine, arginine, and serine, respectively.</p>
               </tblfn>
            </tbl>
            <p>Virtually nothing is known about the splicing mechanism in <it>C. neoformans </it>but the demonstration of alternative splicing pathways in this species <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> as well as low splice site information content (Additional data file 5) <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> make the presence of exonic splicing regulation a credible possibility. Consistent with this, the predicted <it>C. neoformans </it>proteome contains multiple proteins resembling known eukaryotic SR proteins, particularly in that they harbor RNA recognition domains (Additional data file 7). This is suggestive of involvement in splicing, albeit evidently insufficient to reach conclusions about specific functional roles of these proteins.</p>
         </sec>
         <sec>
            <st>
               <p>Cross-species patterns</p>
            </st>
            <p>Whilst the spectra of amino acids preferred/avoided by individual species are ultimately unique in breadth (how many trends) and composition (which amino acids are affected), there is considerable cross-specific overlap in terms of whether a particular trend is present at all, its direction, and relative strength (as measured by the slope of the line of best fit). Tables <tblr tid="T1">1</tblr> and <tblr tid="T2">2</tblr> illustrate that this particular agreement is virtually perfect between human and mouse <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, with marginal differences in the relative strength of individual trends, and that directionality is conserved throughout. Considering zebrafish (Dr) as the only other vertebrate in our sample alongside these species, we notice that its spectrum is slightly diminished in breadth and contains a few trends not seen in the two mammals (G (3'), V (5',3')). However, overall concordance in composition and strength is still remarkably good, and the 'mammalian pattern of directionality' perfectly adhered to. The nematode pair almost matches the human-mouse dyad in terms of overall concordance of preference patterns, with directionality perfectly conserved.</p>
            <p>For the most part, the patterns of preference/avoidance are repeatable across species. Table <tblr tid="T4">4</tblr> shows pairwise comparisons between species giving rank correlations (rho) for the slopes derived from all 23 amino acid groupings. For the vertebrate group both 5' and 3' correlations are very high (all rho > 0.9, all <it>P </it>&lt; 1.81E-06; 90 tests, significance threshold, <it>P </it>&lt; 5.56E-04), with human and mouse in almost perfect agreement. More remarkably, however, some strong correlations also exist 3' between the vertebrates and, for example, <it>Anopheles </it>(all rho > 0.87, all <it>P </it>&lt; 2.94E-06) and <it>Drosophila </it>(all rho > 0.75, all <it>P </it>&lt; 2.9E-05). The 3' correlations are less impressive for the remaining species (Am, At, Cn) but <it>Apis </it>(all rho > 0.75, all <it>P </it>&lt; 4.11E-05) and even <it>Cryptococcus </it>(all rho > 0.69, all <it>P </it>&lt; 5.56E-04) boast remarkably strong 5' correlations with the vertebrates. Focusing on specific amino acid trends, isoleucine (I) stands out in that it is strongly preferred near 3' boundaries across all species; others are well represented, albeit not universal, through the entire phylogeny - for example, 5' avoidance of glutamine (Q), and 3' preference for phenylalanine (F).</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Cross-species correlations of preference slope coefficients considering all 23 amino acid groupings, 5' (bottom-left) and 3' (top-right)<sup>*&#8224;</sup></p>
               </caption>
               <tblbdy cols="11">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Hs</p>
                     </c>
                     <c ca="left">
                        <p>Mm</p>
                     </c>
                     <c ca="left">
                        <p>Dr</p>
                     </c>
                     <c ca="left">
                        <p>Ce</p>
                     </c>
                     <c ca="left">
                        <p>Cb</p>
                     </c>
                     <c ca="left">
                        <p>Ag</p>
                     </c>
                     <c ca="left">
                        <p>Dm</p>
                     </c>
                     <c ca="left">
                        <p>Am</p>
                     </c>
                     <c ca="left">
                        <p>At</p>
                     </c>
                     <c ca="left">
                        <p>Cn</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Hs</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>0.99<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.93<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.71<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.67<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.88<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.84<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.53<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.11</p>
                     </c>
                     <c ca="left">
                        <p>0.08</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Mm</p>
                     </c>
                     <c ca="left">
                        <p>0.99<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>0.92<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.69<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.67<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.88<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.85<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.60<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.20</p>
                     </c>
                     <c ca="left">
                        <p>0.15</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Dr</p>
                     </c>
                     <c ca="left">
                        <p>0.92<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.90<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>0.74<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.71<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.87<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.77<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.48<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.16</p>
                     </c>
                     <c ca="left">
                        <p>0.14</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Ce</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>-0.43<sup>+</sup></b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>-0.39</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>-0.40</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>0.98<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.84<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.72<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.37</p>
                     </c>
                     <c ca="left">
                        <p>0.24</p>
                     </c>
                     <c ca="left">
                        <p>0.16</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Cb</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>-0.60<sup>+</sup></b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>-0.56</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>-0.65<sup>+</sup></b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0.78<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>0.82<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.71<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.34</p>
                     </c>
                     <c ca="left">
                        <p>0.21</p>
                     </c>
                     <c ca="left">
                        <p>0.17</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Ag</p>
                     </c>
                     <c ca="left">
                        <p>0.62<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.60<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.61<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>-0.26</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>0.89<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.50<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.18</p>
                     </c>
                     <c ca="left">
                        <p>0.18</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Dm</p>
                     </c>
                     <c ca="left">
                        <p>0.64<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.61<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.51<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>-0.04</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>-0.14</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0.64<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>0.57<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.21</p>
                     </c>
                     <c ca="left">
                        <p>0.15</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Am</p>
                     </c>
                     <c ca="left">
                        <p>0.76<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.79<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.77<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>-0.32</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>-0.41</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0.48<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.46<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>0.66<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.55<sup>+</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>At</p>
                     </c>
                     <c ca="left">
                        <p>0.44<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.44<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.50<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>-0.36</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>-0.36</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0.06</p>
                     </c>
                     <c ca="left">
                        <p>0.19</p>
                     </c>
                     <c ca="left">
                        <p>0.40</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>0.75<sup>++</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Cn</p>
                     </c>
                     <c ca="left">
                        <p>0.72<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.69<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.75<sup>++</sup></p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>-0.31</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>-0.53</b>
                           <sup>+</sup>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0.39</p>
                     </c>
                     <c ca="left">
                        <p>0.21</p>
                     </c>
                     <c ca="left">
                        <p>0.54<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.52<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>*<it>S. pombe </it>and <it>S. cerevisiae </it>omitted for clarity given the absence of significant correlations. <sup>&#8224;</sup>Negative correlations in bold. +, significant at <it>P </it>= 0.05; + +, significant at <it>P </it>= 0.05/90 = 5.56E-04 (N = 90 tests).</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Deviant nematodes</p>
            </st>
            <p>The strong cross-species concordance in preference patterns makes one observation all the more striking. The nematode 5' spectra behave in a highly counterintuitive manner in that the 'mammalian pattern of directionality' is violated on several occasions: where we do find significant trends in nematodes and other species (E, K, L2, Q, R4, R2, T), all but glutamine (Q) show discrepant directionality (Table <tblr tid="T2">2</tblr>). For example, whereas lysine (K) is strongly preferred near boundaries in vertebrates and some insects (Dm, Am), it appears to be strongly avoided in the 5' region of nematode exons (Figure <figr fid="F2">2</figr>). Table <tblr tid="T4">4</tblr> also underlines the exceptional position of nematodes: 5' correlations between nematodes and any other species are pervasively negative. No single correlation across all amino acids is significantly different from zero applying the adjusted significance threshold (<it>P </it>&lt; 5.56E-04), owing to several trends collapsing into insignificance rather than fully reversing sign. However, the pervasiveness of this pattern is nonetheless noteworthy, especially considering that the same is not the case for the 3' spectra where we find a coherent agreement between nematodes and vertebrates (minimum rho > 0.65, all significant at <it>P </it>&lt; 5.92E-04) and only the two-fold block of serine (S2) shows a reverse pattern of directionality among the significant trends for individual amino acids.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Relative amino acid abundance of lysine (K) at 5' ends of exons in six species</p>
               </caption>
               <text>
                  <p>Relative amino acid abundance of lysine (K) at 5' ends of exons in six species. Proportional usage of lysine <it>vis-&#224;-vis </it>all other amino acids is plotted against distance from the exon-intron boundary measured in amino acids. Variable degrees of preference for lysine near the boundary are evident for non-nematode species (Am, rho = -0.67, <it>P </it>= 2.71E-05, &#946;(slope) = -0.017; Dr, rho = -0.79, <it>P </it>= 6.51E-07, &#946; = -0.035; Dm, rho = -0.65, <it>P </it>= 6.11E-05, &#946; = -0.020; Hs, rho = -0.90, <it>P </it>= 3.67E-09, &#946; = -0.041) whereas nematodes show strong avoidance trends (Ce, rho = 0.89, <it>P </it>= 5.26E-08, &#946; = 0.030; Cb, rho = 0.92, <it>P </it>= 0, &#946; = 0.033).</p>
               </text>
               <graphic file="gb-2008-9-2-r29-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Many species obey an approximately symmetric pattern of preference trends 5' and 3'</p>
            </st>
            <p>This curious discrepancy between 5' and 3' spectra of amino acid trends in nematodes led us to investigate further the relationship of 5' and 3' patterns across species. Considering all amino acid trends simultaneously, rank correlations between slope coefficients (5'~3') were computed. Furthermore, we wanted to explicitly test the hypothesis that preference trends show a 'symmetric' behavior, that is, that individual amino acids exhibit preference trends of similar strength and direction at 5' and 3' ends. To this end, we carried out standardized major axis regressions (SMA; see Materials and methods) <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp> for 5' versus 3' trends in each species and compared the resulting regression line with one expected under perfect symmetry (y = x). The results are given in Table <tblr tid="T5">5</tblr> and graphically represented in Figure <figr fid="F3">3</figr>. Human and mouse show very substantial positive correlations between 5' and 3' preference trends (Hs, rho = 0.8528, <it>P </it>= 1.96E-06; Mm, rho = 0.8626, <it>P </it>= 2.28E-06). Although diminished in strength, we also see significant correlations for <it>Drosophila </it>and <it>Danio</it>. As expected from the previous analysis, correlations for nematodes are negative, albeit not significantly so (Ce, rho = -0.1413, <it>P </it>= 0.5185; Cb, rho = -0.4358, <it>P </it>= 0.0388). However, the SMA results allow us to reject any notion of <it>C. elegans </it>or <it>C. briggsae </it>adhering to a symmetric pattern of amino acid usage, the respective confidence intervals (CIs) ruling out a symmetry slope of &#946; = 1 (CI (Ce), [-1.118; -0.7309]; CI (Cb), [-0.7474; -0.5139]). No other species for which an SMA could be carried out (Table <tblr tid="T5">5</tblr>; Materials and methods) deviate significantly from a symmetric model, although symmetry of amino acid trends varies greatly and can only really be called a defining characteristic of exon ends in vertebrates.</p>
            <tbl id="T5">
               <title>
                  <p>Table 5</p>
               </title>
               <caption>
                  <p>Intraspecific 5'~3' correlations of preference slopes for all 23 amino acid groupings</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3" ca="center">
                        <p>SMA</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Rho</p>
                     </c>
                     <c ca="center">
                        <p><it>P</it>-value*</p>
                     </c>
                     <c ca="center">
                        <p>Slope (&#946;)</p>
                     </c>
                     <c ca="center">
                        <p>Lower Cl<sup>&#8224;</sup></p>
                     </c>
                     <c ca="center">
                        <p>Upper Cl<sup>&#8224;</sup></p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Human</p>
                     </c>
                     <c ca="center">
                        <p>0.85</p>
                     </c>
                     <c ca="center">
                        <p>1.96E-06</p>
                     </c>
                     <c ca="center">
                        <p>1.04</p>
                     </c>
                     <c ca="center">
                        <p>0.83</p>
                     </c>
                     <c ca="center">
                        <p>1.29</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Mouse</p>
                     </c>
                     <c ca="center">
                        <p>0.86</p>
                     </c>
                     <c ca="center">
                        <p>2.28E-06</p>
                     </c>
                     <c ca="center">
                        <p>0.99</p>
                     </c>
                     <c ca="center">
                        <p>0.80</p>
                     </c>
                     <c ca="center">
                        <p>1.23</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>D. rerio</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.66</p>
                     </c>
                     <c ca="center">
                        <p>8.3E-04</p>
                     </c>
                     <c ca="center">
                        <p>1.04</p>
                     </c>
                     <c ca="center">
                        <p>0.78</p>
                     </c>
                     <c ca="center">
                        <p>1.40</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>C. elegans</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>-0.14</p>
                     </c>
                     <c ca="center">
                        <p>0.52</p>
                     </c>
                     <c ca="center">
                        <p>-1.11</p>
                     </c>
                     <c ca="center">
                        <p>-0.73</p>
                     </c>
                     <c ca="center">
                        <p>-1.69</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>C. briggsae</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>-0.44</p>
                     </c>
                     <c ca="center">
                        <p>0.04</p>
                     </c>
                     <c ca="center">
                        <p>-0.75</p>
                     </c>
                     <c ca="center">
                        <p>-0.51</p>
                     </c>
                     <c ca="center">
                        <p>-1.09</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>A. gambiae</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.57</p>
                     </c>
                     <c ca="center">
                        <p>5.16E-03</p>
                     </c>
                     <c ca="center">
                        <p>1.08</p>
                     </c>
                     <c ca="center">
                        <p>0.79</p>
                     </c>
                     <c ca="center">
                        <p>1.48</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>D. melanogaster</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.61</p>
                     </c>
                     <c ca="center">
                        <p>2.49E-03</p>
                     </c>
                     <c ca="center">
                        <p>1.15</p>
                     </c>
                     <c ca="center">
                        <p>0.82</p>
                     </c>
                     <c ca="center">
                        <p>1.62</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>A. mellifera</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.39</p>
                     </c>
                     <c ca="center">
                        <p>0.06</p>
                     </c>
                     <c ca="center">
                        <p>1.32</p>
                     </c>
                     <c ca="center">
                        <p>0.88</p>
                     </c>
                     <c ca="center">
                        <p>1.96</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>A. thaliana</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>-0.22</p>
                     </c>
                     <c ca="center">
                        <p>0.30</p>
                     </c>
                     <c ca="center">
                        <p>NA<sup>&#8225;</sup></p>
                     </c>
                     <c ca="center">
                        <p>NA<sup>&#8225;</sup></p>
                     </c>
                     <c ca="center">
                        <p>NA<sup>&#8225;</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. pombe</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.22</p>
                     </c>
                     <c ca="center">
                        <p>0.31</p>
                     </c>
                     <c ca="center">
                        <p>0.77</p>
                     </c>
                     <c ca="center">
                        <p>0.50</p>
                     </c>
                     <c ca="center">
                        <p>1.17</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. cerevisiae</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.16</p>
                     </c>
                     <c ca="center">
                        <p>0.46</p>
                     </c>
                     <c ca="center">
                        <p>2.42<sup>&#167;</sup></p>
                     </c>
                     <c ca="center">
                        <p>1.58</p>
                     </c>
                     <c ca="center">
                        <p>3.70</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>C. neoformans</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.02</p>
                     </c>
                     <c ca="center">
                        <p>0.92</p>
                     </c>
                     <c ca="center">
                        <p>NA<sup>&#8225;</sup></p>
                     </c>
                     <c ca="center">
                        <p>NA<sup>&#8225;</sup></p>
                     </c>
                     <c ca="center">
                        <p>NA<sup>&#8225;</sup></p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>*With 12 species significance is indicated by <it>P </it>= 0.05/12 = 4.17E-03. <sup>&#8224;</sup>CI = 0.95, the regression line was forced through the origin. <sup>&#8225;</sup>See Materials and methods. <sup>&#167;</sup>Adequacy of SMA regression analysis is seriously in doubt for <it>S. cerevisiae </it>because normal distribution of residuals is strongly violated. NA, not available.</p>
               </tblfn>
            </tbl>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Variable symmetry in amino acid abundance trends comparing 5' and 3' exon ends within species</p>
               </caption>
               <text>
                  <p>Variable symmetry in amino acid abundance trends comparing 5' and 3' exon ends within species. Intraspecific correlations between the 5' (x-axis) and 3' (y-axis) slopes as extracted from individually fitted linear models considering all 23 amino acid groupings are shown. Approximately symmetric arrangements are particularly evident for some species (notably vertebrates) whereas nematode arrangements (Ce, Cb) are not symmetric. Further notable is the higher variability of slope coefficients in some species (vertebrates and nematodes) <it>vis-&#224;-vis </it>others (Am, At). Amino acids are represented by their one letter code (two-fold blocks are denoted by '2'). The regression lines are from SMA regressions. Lines were not fitted for <it>Arabidopsis</it>, <it>Cryptococcus </it>and <it>S. cerevisiae </it>given concerns about the adequacy of this technique for these datasets (see Materials and methods). For associated statistics consult Table 5.</p>
               </text>
               <graphic file="gb-2008-9-2-r29-3"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Amino acid trends are largely consistent with participation in ESE motifs</p>
            </st>
            <p>Intriguingly, asymmetries in the amino acid composition of nematode exon ends appear to be mirrored by a corresponding asymmetry of regulatory motifs. Robinson <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, using a computational approach to characterize candidate ESEs in <it>C. elegans</it>, found that 5' and 3' ends were distinguished by different classes of consensus motifs. Crucially, he found purine-rich human-like candidate motifs to be associated with 3' ends but not 5' ends of nematode exons, which is broadly consistent with our observation that amino acids encoded by purine-rich codons tend to be, in contrast to other animals, disfavored at 5' ends (Table <tblr tid="T2">2</tblr> and Figure <figr fid="F3">3</figr>).</p>
            <p>For mammals, the prediction that amino acids preferred near boundaries should correspond to those favored in ESEs was tested by Parmley <it>et al</it>. <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. The authors defined a metric that quantifies the involvement of amino acids in splice enhancer hexamers relative to the null expectation that every codon is represented in ESEs around its genomic frequency. As predicted, these hexamer preference indices (HPIs), computed for each amino acid grouping, were found to correlate with preference trends, strongly preferred amino acids on average associated with higher HPI values.</p>
            <p>This relationship holds true for human as well as murine ESE sets and amino acid trends, considering either rank correlation coefficients (rho<sub>x</sub>; Hs HPI~rho<sub>x</sub>, rho = -0.54, <it>P </it>&lt; 0.00001, N = 46; Mm HPI~rho<sub>x</sub>, rho = -0.49, <it>P </it>= 0.0005, N = 46) or the slope (&#946;) of the fitted linear model (Hs HPI~&#946;, rho = -0.57, <it>P </it>&lt; 0.0001, N = 46; Mm HPI~&#946;, rho = -0.52, <it>P </it>= 0.0002, N = 46).</p>
            <p>As expected from the demonstration that ESEs can act at varying distances from the splice site <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, human ESEs do not exhibit a reading frame bias beyond what is expected from the genomic frequencies of the underlying codons (Additional data file 8). They can also, in principle, incorporate most codons (Additional data file 8). In consequence, the defined set of amino acids we find avoided or preferred are likely not due to ultimate exclusion of certain codons but because different efficacy and specificity across ESEs mean that often only a well-defined subset of codons can be used to specify the desired ESE.</p>
            <p>Unexpectedly, when we derived HPIs for zebrafish amino acids, using a set of ESEs obtained from the same source <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>, we found a significant correlation of reverse sign (Dr HPI~rho<sub>x </sub>(5'), rho = 0.6, <it>P </it>&lt; 0.003, N = 46; HPI~rho<sub>x </sub>(3'), rho = 0.59, <it>P </it>&lt; 0.0033, N = 46). Many experimentally verified ESEs have been characterized as A-rich and C-poor relative to the background frequency of these nucleotides in coding sequence. Whilst we found this to be the case for putative human ESE motifs not shared with zebrafish (A, 47.38% (ESE) versus 25.57% (exonic); C, 15.28% versus 25.99%, N(ESE) = 204), and for ESEs present in both species (A, 50% versus 25.57%; C, 6.37% versus 25.99%, N = 34), unique zebrafish ESEs (that is, ESEs not present in human) from this dataset were unusually enriched in C (39.47% versus 25.99%, N = 288) and relatively poor in A (18.40% versus 25.57%). Although one would expect ESE motifs to vary across taxa, the discrepancies are so pronounced as to sit awkwardly next to the substantial similarities in amino acid trends (Tables <tblr tid="T1">1</tblr> and <tblr tid="T2">2</tblr>). One criterion used by the Burge group <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> to identify candidate ESE motifs was for such motifs to be more common near weak versus strong splice sites. Therefore, one possible explanation is that C-richness is a characteristic of zebrafish ESEs near weak splice sites but not generally, so that the predicted ESEs are not representative of ESEs across the zebrafish genome. Alternatively, comparatively lower quality of the, then recent, zebrafish genome build might be responsible for the divergent results. A re-examination of these putative zebrafish ESEs with an updated genome build may be worthwhile.</p>
         </sec>
         <sec>
            <st>
               <p>Reduced rates of evolution near the exon-intron boundary in species where ESEs are essential components of the splicing machinery</p>
            </st>
            <p>To further advance the hypothesis that gradients in amino acid abundance near exon-intron boundaries are a critical feature of exon ends in metazoans, we examined the degree of amino acid conservation as a function of distance from the boundary. For three pairs of species (<it>S. cerevisiae</it>-<it>Saccharomyces castellii</it>, <it>D. melanogaster</it>-<it>Drosophila pseudoobscura </it>(Dps); <it>C. elegans</it>-<it>C. briggsae</it>) sets of orthologous internal exons were derived from various sources and aligned at the amino acid level (see Materials and methods). Mirroring results from a comparison of human-mouse orthologues <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, we found strong and highly significant positive correlations of strikingly linear character (Figure <figr fid="F4">4</figr>) between distance from the boundary and amino acid substitution rate for the <it>Drosophila </it>and <it>Caenorhabditis </it>pairs, whilst proximity to the boundary did not appear to confer a higher level of amino acid conservation in the <it>Saccharomyces </it>comparison. Restricting the analysis to exons of at least 70 codons in length, we obtained qualitatively equivalent results (Drosophilae 5', rho = 0.53, <it>P </it>&lt; 0.002, N = 3,690; Drosophilae 3', rho = 0.77, <it>P </it>= 9.70E-07, N = 3,690; Caenorhabdites 5', rho = 0.74, <it>P </it>= 2.33E-06, N = 6,273; Caenorhabdites 3', rho = 0.58, <it>P </it>= 4.5E-04, N = 6,273). This restriction ensures that all exons contribute an approximately equal share of information to each codon position from the boundary and eliminates the potential confounder that short exons might, for reasons unrelated to splicing, feature more frequently in highly conserved genes and create misleading trends by virtue of their disproportionate contribution to substitution rate information closer to the boundary.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Frequency of nonsynonymous change as a function of distance from the exon-intron boundary</p>
               </caption>
               <text>
                  <p>Frequency of nonsynonymous change as a function of distance from the exon-intron boundary. Amino acids are significantly more likely to be conserved near the exon-intron boundary comparing <b>(a) </b><it>C. elegans</it>-<it>C. briggsae </it>(5', rho = 0.957, <it>P </it>= 0; 3', rho = 0.96. <it>P </it>= 0; N = 19,347 exons) and <b>(b) </b><it>D. melanogaster</it>-<it>D. pseudoobscura </it>(5', rho = 0.87, <it>P </it>= 1.02E-07; 3', rho = 0.95, <it>P </it>= 0; N = 7,545 exons). The trends appear approximately monotonous and linear. Location-dependent conservation levels also appear slightly higher near the boundary comparing <b>(c) </b><it>S. cerevisiae</it>-<it>S. castellii </it>but this is not significant (5', rho = 0.11, <it>P </it>= 0.55, N = 51; 3', rho = 0.11, <it>P </it>= 0.55, N = 39; pooled 3'/5', rho = 0.12, <it>P </it>= 0.51, N = 90) or of comparable monotony (but see Additional data file 9).</p>
               </text>
               <graphic file="gb-2008-9-2-r29-4"/>
            </fig>
            <p>Given that the set of aligned <it>Saccharomyces </it>exons consisted entirely of terminal exons (see Materials and methods), we repeated the analysis for a set of 5,352 orthologous pairs of terminal exons from our <it>Drosophila </it>dataset in order to rule out that differences are caused by any special characteristics of terminal exons. Correlations observed for terminal exons closely resemble those for internal exons (5', rho = 0.83, <it>P </it>= 3.8E-07; 3', rho = 0.75, <it>P </it>= 1.95E-06), alleviating any such concerns.</p>
            <p>The above results appear consistent with greater functional significance of boundary-proximal amino acid composition in metazoans, proposed to be at least in part owing to their more extensive utilization of exonic splice regulatory sequences. However, after repeated (k = 10,000) random sampling of 90 aligned terminal exons from the <it>Drosophila </it>dataset and subsequent statistical analysis, we cannot reject the possibility that the <it>Saccharomyces </it>statistics were sampled from the same underlying distribution (Additional data file 9), implying that differences in conservation near exon-intron boundaries cannot be ultimately established from the data at hand.</p>
            <p>Having detected higher levels of amino acid conservation near exon-intron boundaries, we expect genes with a high proportion of sequences near boundaries ('flank-heavy') to evolve more slowly. This is indeed what we found when we considered <it>K</it><sub><it>A </it></sub>as a function of the proportion of sequence within 70 bp of the boundary (Drosophilae, rho = -0.26, <it>P </it>= 2.2E-16, N = 4,132; Caenorhabdites, rho = -0.08, <it>P </it>= 6.18E-09, N = 5,248; Figure <figr fid="F5">5</figr>). We report <it>K</it><sub><it>A </it></sub>rather than <it>K</it><sub><it>A</it></sub>/<it>K</it><sub><it>S</it></sub>, more commonly used as a measure of selection on protein sequence, because the underlying premise of <it>K</it><sub><it>A</it></sub>/<it>K</it><sub><it>S</it></sub>, namely that <it>K</it><sub><it>S </it></sub>reflects neutral rates of evolution, is violated for sequence encoding ESEs <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>The rate of nonsynonymous evolution correlates negatively with the proportion of boundary-proximal sequence</p>
               </caption>
               <text>
                  <p>The rate of nonsynonymous evolution correlates negatively with the proportion of boundary-proximal sequence. <it>K</it><sub><it>A </it></sub>is plotted as a function of the proportion of coding sequence located within 70 bp of an exon-intron boundary for <b>(a) </b><it>D. melanogaster-D. pseudoobscura </it>orthologous genes (rho = -0.26, <it>P </it>= 2.2E-16, N = 4,132) and <b>(b) </b><it>C. elegans</it>-<it>C. briggsae </it>orthologous genes (rho = -0.08, <it>P </it>= 6.18E-09, N = 5,248). The data have been divided into bins along regular decimal intervals (0.1, 0.2, and so on) and the mean <it>K</it><sub><it>A </it></sub>within each bin plotted against the mean proportion of sequence near the boundary. The last (a) and first (b) three bins, respectively, have been pooled to obtain approximately equal bin sizes. Negative trends are present for both sets of aligned genes, but a departure from the general trend is evident for nematode genes with a low proportion of boundary-proximal sequence.</p>
               </text>
               <graphic file="gb-2008-9-2-r29-5"/>
            </fig>
            <p>The results are not qualitatively affected by contracting (50 bp) or expanding (100 bp) the region considered to constitute the boundary flank (Additional data file 10). Focusing on the terminal bins in Figure <figr fid="F5">5a</figr>, it appears that between <it>D. melanogaster </it>and <it>D. pseudoobscura </it>a gene with less than 10% of coding sequence near an exon-intron boundary evolves, on average, almost twice as fast (mean <it>K</it><sub><it>A </it></sub>= 0.195) as a gene with more than 70% of boundary-proximal sequence (mean <it>K</it><sub><it>A </it></sub>= 0.099). Discrepancies in evolutionary rate between 'flank-heavy' and 'core-heavy' bins appear less marked for the nematode pair (mean <it>K</it><sub><it>A </it></sub>(%CDS near boundary >0.9) = 0.12; mean <it>K</it><sub><it>A </it></sub>(%CDS near boundary &lt;0.3) = 0.18). However, Figure <figr fid="F5">5b</figr> suggests that this is principally owing to curiously elevated levels of conservation for genes with a small proportion of sequence near the boundary, that is, genes with very large exons, a feature we did not encounter in the analysis of either insect (Dm-Dps) or mammalian (Hs-Mm) orthologues <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>.</p>
            <p>Importantly, this anomaly highlights a more general reservation, namely that any measure capturing the proportion of sequence near the boundary will strongly covary with exon length, which in turn might covary with underlying functional determinants of evolutionary rate entirely unrelated to splicing control. Thus, in order to control for any putatively distorting effects of functional class on <it>K</it><sub><it>A</it></sub>, we employed the following strategy: For each aligned gene, we concatenated the flanking regions of all exons, 5' and 3', defined as the first 72 bp bordering the exon-intron junction of trimmed exons. By implication, genes with no exon larger than 144 bp had to be excluded from this analysis. Concurrently, we concatenated the core sections of all exons of sufficient length in the respective gene, defined as the sequence block enclosed by the two flanking regions. As accurate estimation of <it>K</it><sub><it>A </it></sub>probably requires a minimum of 100 codons, we further restricted analysis to those genes with at least 300 bp in the concatenated flanks and in the concatenated cores of exons. For each gene meeting the above criteria we then determined the rates of amino acid evolution in the concatenated core sections (<it>K</it><sub><it>Ac</it></sub>) and flanking sections (<it>K</it><sub><it>Af</it></sub>). We find that more <it>Drosophila </it>orthologous genes than expected by chance have faster evolving core regions (median (<it>K</it><sub><it>Ac </it></sub>- <it>K</it><sub><it>Af</it></sub>)/<it>K</it><sub><it>Af</it></sub>) = 0.14, Wilcoxon signed rank test <it>P </it>&lt; 0.0001, N = 1,237; Figure <figr fid="F6">6</figr>), consistent with the evidence, presented above, for additional sequence constraint operating on flanking regions. A significant tendency towards more rapid evolution in core sections is also evident when we confine the sample to genes with at least 600 bp in flanking as well as core regions (median (<it>K</it><sub><it>Ac </it></sub>- <it>K</it><sub><it>Af</it></sub>)/<it>K</it><sub><it>Af</it></sub>) = 0.14, Wilcoxon signed rank test <it>P </it>&lt; 0.0001, N = 785). Despite exhibiting the expected shift towards average higher <it>K</it><sub><it>A </it></sub>in the core of exons, this trend is much less pronounced than in a previously reported comparison of human-mouse orthologues (median (<it>K</it><sub><it>Ac </it></sub>- <it>K</it><sub><it>Af</it></sub>)/<it>K</it><sub><it>Af</it></sub>) = 0.68, Wilcoxon signed rank test <it>P </it>&lt; 0.0001, N = 360; Figure <figr fid="F6">6c</figr>, and see Parmley <it>et al</it>. <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> for details). Curiously, for the nematode pair, we find significant evidence for a reverse correlation (300 bp, median (<it>K</it><sub><it>Ac </it></sub>- <it>K</it><sub><it>Af</it></sub>)/<it>K</it><sub><it>Af</it></sub>) = -0.07, Wilcoxon signed rank test <it>P </it>&lt; 0.0001, N = 1,102; 600 bp, median (<it>K</it><sub><it>Ac </it></sub>- <it>K</it><sub><it>Af</it></sub>)/<it>K</it><sub><it>Af</it></sub>) = -0.014, <it>P </it>&lt; 0.038, N = 496), that is, in the majority of genes, flanking regions evolve at a marginally higher rate than core regions.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Exon cores and flanks evolve at different rates</p>
               </caption>
               <text>
                  <p>Exon cores and flanks evolve at different rates. Histograms of logged <it>Kr </it>ratios (<it>K</it><sub><it>Ac</it></sub>/<it>K</it><sub><it>Af</it></sub>), using 100 bins, for <b>(a) </b><it>D. melanogaster-D. pseudoobscura </it>orthologous genes (N = 1,237), <b>(b) </b><it>C. elegans-C. briggsae </it>orthologous genes (N = 1,102), and <b>(c) </b>human-mouse orthologous genes (N = 360) with a minimum of 300 bp of concatenated middle and flanking sequence of exons are plotted. The dashed line in each graph indicates ln(<it>Kr</it>) = 0, the point at which middle and flanking sections evolve at the same average rate. The arrows indicate the median logged <it>Kr </it>ratios of (a) 0.128, (b) -0.065, and (c) 0.559, respectively. All three are significantly different from the null expectation of ln(<it>Kr</it>) = 0 (<it>P </it>&lt; 0.0001). Note the much more marked departure from the null expectation in the mammalian dataset.</p>
               </text>
               <graphic file="gb-2008-9-2-r29-6"/>
            </fig>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <sec>
            <st>
               <p>General trends</p>
            </st>
            <p>Parmley <it>et al</it>. <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> recently presented evidence that, in mammals, amino acid usage in the vicinity of exon-intron boundaries is affected by factors unrelated to protein function but to sequence-based information required for correct splicing. The objective of the present study was to elucidate whether such requirements have left an evolutionary imprint on exonic sequence composition across a phylogenetically diverse set of species. To this end, we systematically compared trends in relative amino acid abundance near exon-intron boundaries in 12 eukaryotic species. Our analysis revealed that preference for or avoidance of certain amino acids near boundaries is a common phenomenon among metazoan species but is not unique to metazoans. More amino acids show skewed usage in species where a greater problem identifying intron-exon boundaries is to be expected, that is, those with large and numerous introns. Notably, this includes the basidiomycete <it>C. neoformans</it>, suggesting that exonic splicing regulation might be a generic characteristic of species with complex pre-mRNA structures rather than absent from the fungal kingdom by virtue of phylogeny. Preference patterns show unmistakable signs of conservation along several dimensions: composition, relative strength, and directionality. The concordance in directionality (whether an amino acid is preferred or avoided) is particularly impressive in that we observe many commonalities with the mammalian pattern even in only distantly related species.</p>
            <p>We do not claim that the systematic patterns we observe are solely caused by a selected preference for codons involved in ESEs. In fact, composite trends are almost certain to be the result of multiple functional constraints, including the need to avoid intron-specific enhancer motifs (for example GGG in mammals <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>) as well as motifs that would disrupt exon recognition. Furthermore, abundance trends could partially be the result of cryptic splice site avoidance as suggested by Eskesen and colleagues <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. However, many of the trends observed - for example, cytosine avoidance near boundaries - are not predicted by this model <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B11">11</abbr></abbrgrp>.</p>
            <p>Introns associate non-randomly with the codon in direct proximity to the splice site in a phase-specific manner, an observation often described as insertional preference <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. Trimming and elimination of the first full codon should guard against picking up such insertional preferences or an extended splice site consensus. We cannot rule out that some boundary-proximal codons have slipped into our dataset owing to poor splice site annotation. However, it must be pointed out that this reservation applies only to the subset of amino acid trends that show biased usage directly adjacent to introns and might be more relevant to the interpretation of local discontinuities (Additional data file 11). Also, if the above-mentioned explanations were of major relevance, we would expect cryptic splice site avoidance, insertional preference, and (to a lesser extent) poor splice site annotation to cause similar patterns in ascomycetous yeasts, in particular <it>S. pombe</it>, for which a dataset of reasonable size is available. This is not the case.</p>
            <p>Establishing to what extent these trends are caused by preference for ESEs will ultimately depend on characterizing species-specific catalogues of ESE/Exonic splicing silencer (ESS) motifs together with their corresponding <it>trans-</it>factors and relating these to the observed spectra of preferred/avoided amino acids. This work, in particular relating to tissue- and stage-specific splicing patterns, is still in its infancy <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>, the catalogues currently available restricted to a small number of vertebrates and yet to be fully verified experimentally <abbrgrp><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr></abbrgrp>.</p>
            <p>However, the dearth of significant trends in <it>S. cerevisiae </it>and <it>S. pombe </it>strengthens the proposition that preference trends principally reflect requirements to accommodate exonic splicing regulators. Although the <it>S. cerevisiae </it>genome codes for an SR protein kinase (Sky1p) with the capacity to phosphorylate mammalian arginine-serine rich (RS) domains, the likely endogenous substrate (the SR protein-like Npl3p) does not appear to be involved in pre-mRNA splicing <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B32">32</abbr></abbrgrp>. Importantly, no splicing factors homologous to metazoan SR proteins have been discovered in <it>S. cerevisiae </it><abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, consistent with the classical view that splicing in budding yeast is regulated intronically. This is further consistent with the observation that splice site consensus is generally highly conserved, especially 5', much more so than in other species, including <it>C. neoformans </it>(Additional data file 5). The fact that our analysis revealed a significant 3' trend for the two-fold block of leucine (L2) might hint at the presence of recognition motifs in yeast exonic sequence. However, at present there is no evidence supporting the regular involvement of an ESE-like binding motif in <it>S. cerevisiae </it>splicing and alternative explanations should be considered.</p>
            <p>Splicing in <it>S. cerevisiae </it>is moderately common in quantitative terms because many highly expressed genes, notably encoding ribosomal proteins, contain introns, so that over 25% of the mRNA population are spliced <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. However, in over 6,000 <it>S. cerevisiae </it>genes we find less than 300 introns in total, so that splicing can hardly be considered a processing stage representative on a genome-wide scale. In contrast, splicing is much more prevalent in <it>S. pombe </it>where approximately 40% of genes contain introns <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. Basal splicing proteins show an enhanced similarity to their mammalian homologues and two SR protein homologues (Srp1p, Srp2p) have been identified <abbrgrp><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr></abbrgrp>. Unlike in budding yeast, there is recent evidence that Srp2p binds to specific exonic elements and interacts with the fission yeast orthologue of human splice factor U2AF <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. Why then, given that SR protein-ESE-like interactions seem to exist in <it>S. pombe</it>, do we not find any trends for amino acid or codon preference in this species? We suggest that trends may be lacking for two reasons. Firstly, given the comparatively low level of splice site consensus degeneracy, a minimal number of ESEs might be sufficient to ensure correct splicing. On a genomic level, we might then fail to register biased abundance patterns on the spatial scale investigated in this study. Secondly, for clear-cut preference trends to evolve, a minimum level of splice-regulatory complexity might be required. This fits with our observation that more amino acid trends are observed in species with complex, intron-rich gene structures, including the yeast <it>C. neoformans </it>(Additional data file 6). Further, alternative splicing contexts, where regulatory elements frequently compete for precedence if arranged close to each other, could be envisaged as an evolutionary pressure initially driving the diversification of ESEs and corresponding <it>trans-</it>factors, thereby creating an environment in which strong trends might be required to attract or repel the correct set of <it>trans-</it>factors, both for constitutively and alternatively spliced genes. Consistent with this hypothesis, reports of alternative splicing in <it>S. cerevisiae </it><abbrgrp><abbr bid="B39">39</abbr></abbrgrp> and <it>S. pombe </it><abbrgrp><abbr bid="B40">40</abbr></abbrgrp> are restricted to singular cases, for which functionality of the recovered alternative splice products remains to be shown <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>. However, attempts to link diversity and density of ESEs to alternative splicing have so far yielded ambiguous results <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>.</p>
            <p>The absence of preference patterns in ascomycetous yeasts has an important practical implication. Finding amino acid trends to be abundant near exon-intron boundaries can be regarded as evidence for exon-based splicing regulation, without prior knowledge of specific binding motifs or <it>trans</it>-factors, although failure to detect such trends is insufficient to rule out interaction between exons and auxiliary proteins in the splicing process (compare <it>S. pombe</it>).</p>
         </sec>
         <sec>
            <st>
               <p>Nematode exceptionalism in an ESE framework: is <it>trans</it>-splicing to blame?</p>
            </st>
            <p>The fundamental deviation from the 'mammalian pattern of directionality' shown by the 5' amino acid trends in nematode exons (Table <tblr tid="T1">1</tblr>) might, at first sight, be unexpected. There are extensive homologies between vertebrate and nematode basal splicing machineries on the protein level <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. Furthermore, splicing in SR protein-depleted cells of the <it>Caenorrhabditis </it>relative <it>Ascaris lumbricoides </it>can be rescued by adding SR proteins derived from non-nematode (HeLa) whole cell extracts, supporting at least a minimum degree of functional overlap <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. Thirdly, the high level of conservation between SR and SR-like proteins identified in each species explicitly includes the RNA recognition motifs, tentatively suggesting similar binding specificities <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>.</p>
            <p>There is, however, one feature of the nematode splicing process that sets it apart from the other species in our sample: a substantial proportion (approximately 70%) of <it>C. elegans </it>(and <it>C. briggsae</it>) genes are <it>trans-</it>spliced <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>. In this process a short (22 nucleotide) 5' small nuclear RNA (snRNA) fragment, the spliced leader, which is transcribed from a different genomic locale, is added at the 5' end of the pre-mRNA <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. It would, we suggest, be highly disadvantageous for this <it>trans</it>-splicing machinery to act at the 5' end of exons where <it>cis</it>-splicing should occur. Indeed, were <it>trans</it>-splicing to occur where intron removal should take place, a gene would, in effect, be broken in two. Thus, we suggest that 5' ends of internal exons have evolved to ensure that they do not attract the <it>trans</it>-splicing machinery. Given that this machinery is ubiquitous in a cell, all 5' ends of internal exons, be they from <it>trans</it>-spliced genes or not, should be equally under pressure to avoid <it>trans</it>-splicing where <it>cis</it>-splicing should happen. Consistent with this expectation, the trends seen at 5' and 3' ends in internal exons are the same in genes from operons and those not in operons (data not shown). Interestingly, information content at 3' splice sites in nematodes is strikingly higher than in other species (Additional data file 5), as previously observed <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>, further supporting the idea that splicing regulation in nematodes is unusual in its asymmetry.</p>
            <p>What might be the proteins involved in <it>trans</it>-splicing? There is good evidence that several stages of the <it>trans</it>-splicing process are, like <it>cis</it>-splicing, critically supported by SR proteins <abbrgrp><abbr bid="B43">43</abbr><abbr bid="B48">48</abbr></abbrgrp>. Furthermore, whilst mammalian and <it>Ascaris </it>SR protein extracts are equally efficient in catalyzing <it>cis</it>-splicing <it>in vitro</it>, <it>Ascaris </it>SR protein extracts engender an approximately five-fold higher <it>trans-</it>splicing activity <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. Although the use of whole cell extracts in these experiments precludes an analysis of the differential contribution of individual SR proteins, these observations are consistent with the hypothesis that a subset of splice-regulatory proteins in these species is dedicated to <it>trans-</it>splicing.</p>
            <p>Given the above, we envisage <it>trans</it>-splicing specific SR and other proteins to interact primarily with intergenic sequence upstream of the first exon of the pre-mRNA to provide further guidance for the <it>trans</it>-splicing apparatus or mediate other functions crucial to <it>trans</it>-splicing, such as protecting downstream RNA from degradation <abbrgrp><abbr bid="B45">45</abbr><abbr bid="B49">49</abbr></abbrgrp>. A prediction derived from this model is that we should find in nematodes proteins participating in <it>trans</it>-splicing that bind to nucleotide motifs depleted of codons from amino acids avoided near the 5' end of exons.</p>
         </sec>
         <sec>
            <st>
               <p>Symmetric exons?</p>
            </st>
            <p>Owing to their deviant 5' trends, nematodes stand out in another aspect of systematic amino acid biases. Parmley <it>et al</it>. <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> observed no significant differences in preference trends between 5' and 3' ends of exons in mammals. Similarly, approximate symmetry has been reported for ESE distribution in human exons <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. Conversely, standardized major axis regressions <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp> strongly suggest that nematodes do not conform to a symmetric pattern of preference trends.</p>
            <p>An assessment of this situation very much depends on how we expect ESE-guided splicing regulation to work on a mechanistic level. If SR proteins are assumed to interact directly with specific components of the basal splicing machinery, as is probably the case for U2AF <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, we would not automatically expect the same ESEs (and by implication amino acid trends) to be represented at similar frequencies 5' and 3' where different spliceosomal proteins are present. Predictions of whether symmetry might be of functional relevance, however, especially for scenarios of indirect interaction, cannot be derived from the data at hand.</p>
            <p>Confidence intervals in our exploration of symmetry are large so that we cannot ascertain that symmetry is a dominant pattern throughout our species sample. However, some best estimates of SMA slopes (&#946;) are tantalizingly close to perfect symmetry (Mm, &#946; = 0.9907; Hs, &#946; = 1.0362; Dr, &#946; = 1.0439; Ag, &#946; = 1.0788; Table <tblr tid="T5">5</tblr>), warranting more detailed examination of this potentially functional signature in the future.</p>
         </sec>
         <sec>
            <st>
               <p>Patterns of amino acid evolution</p>
            </st>
            <p>Consistent with the proposition that trends in relative amino acid abundance are functionally important, we observe lower rates of nonsynonymous evolution near exon-intron boundaries in insects (Dm-Dps), nematodes (Ce-Cb) and mammals (Hs-Mm), indicative of higher selective constraint in this region. Furthermore, the proportion of coding sequence that is located near boundaries is a partial predictor of <it>K</it><sub><it>A </it></sub>(Figure <figr fid="F5">5</figr>). Genes with a higher share of sequence partaking in exon flanks tend to show reduced rates of evolution. Nematode genes, again, stand out in that they do not conform to the negative linear relationship between <it>K</it><sub><it>A </it></sub>and flank-heaviness found in other species pairs (Hs-Mm and Dm-Dps), but show unexpectedly high levels of conservation for genes with very large exons. The causes for this currently remain elusive. Similarly, we would not have predicted that in worms gene-specific differences between evolutionary rate in the flanking and core sections of exons are biased (if only slightly) towards more rapid evolution of flanking regions. However, the distribution of core-flank evolutionary rate differentials in worms appears comparable to the one for flies, a higher median evolutionary rate of core regions in the latter notwithstanding (Figure <figr fid="F6">6</figr>). Human-mouse orthologous genes on the other hand show a much more dramatic distributional shift towards faster evolution in exon cores (see distributions in Figure <figr fid="F6">6</figr>). Between-taxa differences in gene composition, especially relating to the presence of more and longer introns in mammals, might account for these differences: on a speculative note, information necessary to distinguish an exon from surrounding non-coding sequence might require a unique degree of conservation under these circumstances, perhaps severely restricting the leeway for nonsynonymous changes to occur in flanking regions. Alternatively, restrictions imposed by our experimental set-up, especially relating to minimum sequence length requirements, might have resulted in the selection of gene sets with divergent splicing characteristics in the different species pairs. We leave a closer dissection of these questions to further analysis.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>Biased usage of amino acids in the vicinity of exon-intron boundaries is a common feature in metazoan genes, with the direction of biases largely consistent between taxa. That the biases accord with sequence preferences of SR proteins and that such biases are not seen in intron-poor yeasts support the view that dual coding of DNA in exons, to specify both which amino acids to employ and where introns are to be removed, is a common feature of metazoan species and more generally in genomes in which exons are relatively small islands in a sea of intronic sequences in the immature mRNA. Interestingly, similar skews in amino acid composition can be observed for the intron-rich fungus <it>C. neoformans</it>, suggesting that exonic splicing regulation might occur in this species. In nematodes, the possible relationship between <it>trans</it>-splicing and the exceptional departure from the mammalian pattern of amino acid trends at the 5' end of exons deserves further scrutiny. The results presented here suggest a simple sequence-based, species-independent diagnostic for the relative importance of exonic splicing regulation in a particular species given nothing more than a well-annotated genome.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Relative amino acid abundance near exon-intron boundaries</p>
            </st>
            <p>For 12 species (human, mouse, zebrafish, <it>C. elegans</it>, <it>C. briggsae</it>, <it>A. gambiae</it>, <it>D. melanogaster</it>, <it>A. mellifera</it>, <it>A. thaliana</it>, <it>S. cerevisiae</it>, <it>S. pombe</it>, <it>C. neoformans</it>) we established individual exon datasets derived from a small number of databases (Additional data file 12). Pre-established CDS tracks were followed in all but three cases (At, Sp, Cn), for which annotated chromosome/scaffold sequences were downloaded from the relevant database and exons extracted subsequently. Exons with identical locus IDs were then sorted into individual files, only retaining files with at least one internal exon. All locus files were subsequently checked to ensure coding sequence started with ATG, finished with a stop codon (TAA, TAG, TGA), had no internal stop codons, and was a multiple of three nucleotides. Locus files where one of the above prerequisites was violated were removed from the final dataset. We also eliminated exons containing one or more ambiguous nucleotides ('n'). The remaining exons were trimmed so that the first nucleotide was the first nucleotide of the first complete codon and the last nucleotide the last of the final complete codon. Then, we discarded all terminal exons to obtain the final exon sets. Gene models from which exons were derived are provided in Additional data file 13.</p>
            <p>After splitting individual exons in half to ensure that no codon featured in both 5' and 3' analyses, we considered the trend in usage of each amino acid as a function of the distance from the boundary up to a maximum distance of 34 codons. Importantly, the codon in direct proximity to the boundary was also eliminated.</p>
            <p>We then calculated Spearman rank correlations (rho) between the distance from the boundary (5' or 3') and proportional usage of the amino acid (that is, in proportion to the number of residues at that given distance) for the remaining 33 data points for each species. The three six-fold degenerate amino acids we split into blocks of four and two (that is, 'S4' signifies, TCA, TCC, TCG and TCT, while 'S2' signifies AGC and AGT). In relevant circumstances, the two-fold and four-fold blocks were treated as separate amino acids, yielding a total of 23 amino acid groupings.</p>
            <p>For each amino acid grouping independently we fitted unweighted linear models and extracted the slope of the regression line to be used as a basic measure of the strength of individual preference trends. Note that a negative rho/slope implies an amino acid that is preferred near boundaries and a positive rho/slope implies a tendency to be avoided. Unless otherwise stated, results are reported as significant only if they remain significant after correction for multiple testing (see Results for adjusted <it>P</it>-values).</p>
            <p>For the most part, trends are approximately monotonic and linear and hence adequately captured by simple linear models. For certain amino acids, departures from linearity, some recurrent across species and typically highly localized, do exist however. Unusual U-shaped 5' trends for proline, originally noted for human and mouse by Parmley <it>et al</it>. <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, are also present in other species (Ce, Dr). Further, some amino acids, notably isoleucine and the two-fold block of leucine, are disproportionately preferred in direct proximity to the boundary (after trimming) at 3' exon ends in several species. 'Popping out' from otherwise linear trends (Additional data file 14), these patterns are perhaps caused by participation of the relevant codons in an extended splice site consensus relevant for U5 snRNA-mediated exon joining (see Additional data file 11 for a more detailed discussion of recurrent, locally confined preference/avoidance patterns and potential functional explanations). As a corollary of discontinuities more generally, comparative interpretation of slope coefficients as an index of relative strength ought to be done with care. In particular, our rank ordering of slopes derives its value from providing another dimension through which congruence in preference spectra can be asserted, rather than being easily translated into differential functional impact on a mechanistic level.</p>
         </sec>
         <sec>
            <st>
               <p>Modifications in the analysis of <it>S. cerevisiae </it>exons</p>
            </st>
            <p>Given the small number of internal exons in <it>S. cerevisiae </it>(only eight genes have more than one intron), we decided to include terminal exons in the final dataset (417 exons) for this species. The one end of each terminal exon that did not border the intron was excluded. Otherwise, the removal of irregularities (internal stop codons and so on) proceeded as described above. Restricted sample size also indirectly prompted a re-examination of the results obtained from Spearman's rank correlations because the presence of multiple tied ranks led to concerns about the adequacy of this statistic. However, using the more appropriate Kendall's tau statistic did not return any qualitatively different results.</p>
         </sec>
         <sec>
            <st>
               <p>Cross-species patterns in preference across all amino acid groupings</p>
            </st>
            <p>For 5' and 3' datasets independently, Spearman's correlations were computed between the previously derived slope coefficients of all 23 amino acid groupings for every possible metazoan species pair. Ninety tests (with the number of species N = 10, N^2-N = 90) were carried out and significance threshold adjusted accordingly (<it>P </it>= 0.05/90 = 5.56E-04). We initially included both yeast species in the analysis but, as expected from the absence of significant individual amino acid trends, we found no significant correlations for the global amino acid set (data not shown). No loss of relevant information is incurred whilst clarity of presentation is enhanced when these species are excluded from the analysis and, in particular, the accompanying table (Table <tblr tid="T5">5</tblr>).</p>
         </sec>
         <sec>
            <st>
               <p>Comparison of orthologous exons</p>
            </st>
            <sec>
               <st>
                  <p><it>S. cerevisiae</it>-<it>S. castellii</it></p>
               </st>
               <p>A set of <it>S. cerevisiae-S. castellii </it>orthologous genes, based on a re-annotation of the <it>S. castellii </it>genome by Wolfe and colleagues, were obtained from the Yeast Gene Order Browser <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>. For each <it>S. cerevisiae </it>gene that contributed exons to our analysis of amino acid abundance, we checked whether a homologous <it>S. castellii </it>gene was present on the same positional track, the rationale being to compare true orthologues rather than outparalogues. If putatively orthologous gene pairs were found on both tracks, implying the retention of two post-genome duplication paralogues in both species, only the pair on track 1 was considered. This procedure yielded 164 orthologue pairs. <it>S. castellii </it>open reading frame structure downloaded from the same source was used to eliminate all <it>S. castellii </it>genes that lacked any introns, did not have a regular start or stop codon, or whose exon sequence was not a multiple of three nucleotides. Further discarding all genes with unequal exon number or unequal intron phase between species, 51 gene pairs (102 exons) remained. We further eliminated all exons shorter than eight amino acids in length as these were considered uninformative. After trimming (see above) codons were translated into amino acids and orthologous exons aligned using MUSCLE (version 3.6) <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>. After alignment, the first and last amino acid of each exon were removed. Exons were then split in half so that any one amino acid features exclusively in either 5' or 3' analysis. We then calculated the number of amino acid changes over the total number of informative (amino acid present in both species) sites for each amino acid position from the boundary, including only exon ends that bordered an intron (that is, only the 3' end for the first exon and only the 5' end for the last exon).</p>
               <p>Spearman's and Kendall's rank correlations between distance from the boundary and the proportion of amino acids changed were computed for 5' and 3' ends separately. Given the small sample sizes for end-specific analyses (N(5') = 51, N(3') = 39), we also computed rank correlations for 5' and 3' ends pooled. Linear models were fitted for each analysis, weighting by the number of informative sites at distance x from the boundary.</p>
            </sec>
            <sec>
               <st>
                  <p><it>D. melanogaster</it>-<it>D. pseudoobscura</it></p>
               </st>
               <p>A list of <it>D. melanogaster-D. pseudoobscura </it>orthologous genes was obtained from the Inparanoid database <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>. <it>D. pseudoobscura </it>exons were downloaded from the flybaseGene track on the UCSC genome browser <abbrgrp><abbr bid="B53">53</abbr></abbrgrp> and sorted into files by gene locus, eliminating genes with irregularities as described above. Using the orthologue list we established a set of 4,165 orthologue pairs for which genes were present in the cleaned datasets of both species; 2,677 gene pairs (comprising 7,545 orthologous internal exon pairs, and 5,352 orthologous terminal exon pairs) remain after checking for equal exon number and intron phase. Trimming of exons, alignment and statistical analysis were carried out as described for <it>S. cerevisiae-S. castellii</it>. The 3' and 5' ends were considered for each internal exon, whereas only exon ends bordering an intron were included in the analysis of terminal exons.</p>
            </sec>
            <sec>
               <st>
                  <p><it>C. elegans</it>-<it>C. briggsae</it></p>
               </st>
               <p>Each <it>C. elegans </it>locus file was translated into protein and queried against a database of all translated <it>C. briggsae </it>locus files using BLAST (blastp), and vice versa. Only reciprocal best hits with an expectation E &#8804; 1 were retained. After checking for equal exon number and intron phase, 5,358 orthologous gene pairs (19,347 orthologous internal exon pairs) remained. Trimming and alignment were carried out as described above for <it>Drosophila</it>. Orthologues for all comparative species are given in Additional data file 13.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Intraspecific 5'~3' correlations and symmetry analysis</p>
            </st>
            <p>Covering all 23 amino acid groupings Spearman's rank correlations were computed between 5' and 3' trends within each species (N = 12, <it>P </it>= 0.05/12 = 4.17E-03).</p>
            <p>SMA regressions were computed in R using the SMATR package <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp> applying standard confidence limits (95% CI). As symmetry of the type x = y was to be tested, the regression line was forced through the origin. SMA regression requires estimates of the slope of the regression line to have a consistently positive or negative sign so that the major and minor axes can be identified unambiguously. This is not the case for either <it>A. thaliana </it>or <it>C. neoformans</it>, which are hence not amenable to this type of analysis and were not included. Further, residual distribution for <it>S. cerevisiae </it>shows significant deviation from normality so that results for this species should be interpreted with care.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>Ag, <it>Anopheles gambiae</it>; Am, <it>Apis mellifera</it>; At, <it>Arabidopsis thaliana</it>; Cb, <it>Caenorrhabditis briggsae</it>; CDS, coding sequence; Ce, <it>Caenorrhabditis elegans</it>; CI, confidence interval; Cn, <it>Cryptococcus neoformans</it>; Dm, <it>Drosophila melanogaster</it>; Dps, <it>Drosophila pseudoobscura</it>; Dr, <it>Danio rerio</it>; ESE, exonic splicing enhancer; ESS, exonic splicing silencer; HPI, hexamer preference index; Hs, human; Mm, mouse; Sc, <it>Saccharomyces cerevisiae</it>; SMA, standard major axis; Sp, <it>Schizosaccharomyces pombe</it>; SR protein, serine-arginine protein.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>TW compiled, processed, and analyzed the data. JLP participated in the HPI analysis and provided scripts. LDH conceived of and coordinated the study. TW and LDH wrote the paper. All authors read and approved the final manuscript.</p>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>The following additional data are available. Additional data file <supplr sid="S1">1</supplr> is a table giving the amino acid trends for all species and associated statistics. Additional data file <supplr sid="S2">2</supplr> is a table giving amino acid trends and associated statistics for homology-reduced gene sets of <it>D. melanogaster </it>and <it>C. elegans</it>. Additional data file <supplr sid="S3">3</supplr> contains the protocol for homology reduction of <it>C. elegans </it>and <it>D. melanogaster </it>orthologues. Additional data file <supplr sid="S4">4</supplr> contains the protocol for covariate analysis of abundance trends. Additional data file <supplr sid="S5">5</supplr> is a table listing covariates of amino acid trends by species. Additional data file <supplr sid="S6">6</supplr> is a table giving by-species cross-correlations for covariates of amino acid trends. Additional data file <supplr sid="S7">7</supplr> is a table listing best blast hits of SR proteins against <it>C. neoformans </it>genes and Pfam domain scores in those genes. Additional data file <supplr sid="S8">8</supplr> contains an analysis of ESE positioning in relation to the reading frame. Additional data file <supplr sid="S9">9</supplr> is a figure showing re-sampling distributions of evolutionary rates. Additional data file <supplr sid="S10">10</supplr> is a table giving rank correlations between <it>K</it><sub><it>A </it></sub>and the proportion of sequence near the exon-intron boundary. Additional data file <supplr sid="S11">11</supplr> contains a detailed characterization of specific local discontinuities. Additional data file <supplr sid="S12">12</supplr> is a table giving the sources of exon datasets. Additional data file <supplr sid="S13">13</supplr> is a table giving the gene model IDs from which exons were derived. Additional data file <supplr sid="S14">14</supplr> is a figure giving examples of locally discontinuous preference trends. Additional data file <supplr sid="S15">15</supplr> is a table detailing local discontinuities across selected species.</p>
         <suppl id="S1">
            <title>
               <p>Additional data file 1</p>
            </title>
            <caption>
               <p>Amino acid trends for all species and associated statistics</p>
            </caption>
            <text>
               <p>Amino acid trends for all species and associated statistics.</p>
            </text>
            <file name="gb-2008-9-2-r29-S1.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S2">
            <title>
               <p>Additional data file 2</p>
            </title>
            <caption>
               <p>Amino acid trends and associated statistics for homology-reduced gene sets of <it>D. melanogaster </it>and <it>C. elegans</it></p>
            </caption>
            <text>
               <p>Amino acid trends and associated statistics for homology-reduced gene sets of <it>D. melanogaster </it>and <it>C. elegans</it>.</p>
            </text>
            <file name="gb-2008-9-2-r29-S2.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S3">
            <title>
               <p>Additional data file 3</p>
            </title>
            <caption>
               <p>Protocol for homology reduction of <it>C. elegans </it>and <it>D. melanogaster </it>orthologues</p>
            </caption>
            <text>
               <p>Protocol for homology reduction of <it>C. elegans </it>and <it>D. melanogaster </it>orthologues.</p>
            </text>
            <file name="gb-2008-9-2-r29-S3.doc">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S4">
            <title>
               <p>Additional data file 4</p>
            </title>
            <caption>
               <p>Protocol for covariate analysis of abundance trends</p>
            </caption>
            <text>
               <p>Protocol for covariate analysis of abundance trends.</p>
            </text>
            <file name="gb-2008-9-2-r29-S4.doc">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S5">
            <title>
               <p>Additional data file 5</p>
            </title>
            <caption>
               <p>Covariates of amino acid trends by species</p>
            </caption>
            <text>
               <p>Covariates of amino acid trends by species.</p>
            </text>
            <file name="gb-2008-9-2-r29-S5.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S6">
            <title>
               <p>Additional data file 6</p>
            </title>
            <caption>
               <p>By-species cross-correlations for covariates of amino acid trends</p>
            </caption>
            <text>
               <p>By-species cross-correlations for covariates of amino acid trends.</p>
            </text>
            <file name="gb-2008-9-2-r29-S6.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S7">
            <title>
               <p>Additional data file 7</p>
            </title>
            <caption>
               <p>Best blast hits of SR proteins against <it>C. neoformans </it>genes and Pfam domain scores in those genes</p>
            </caption>
            <text>
               <p>Best blast hits of SR proteins against <it>C. neoformans </it>genes and Pfam domain scores in those genes.</p>
            </text>
            <file name="gb-2008-9-2-r29-S7.txt">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S8">
            <title>
               <p>Additional data file 8</p>
            </title>
            <caption>
               <p>Analysis of ESE positioning in relation to the reading frame</p>
            </caption>
            <text>
               <p>Analysis of ESE positioning in relation to the reading frame.</p>
            </text>
            <file name="gb-2008-9-2-r29-S8.doc">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S9">
            <title>
               <p>Additional data file 9</p>
            </title>
            <caption>
               <p>Re-sampling distributions of evolutionary rates</p>
            </caption>
            <text>
               <p>Re-sampling distributions of evolutionary rates.</p>
            </text>
            <file name="gb-2008-9-2-r29-S9.doc">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S10">
            <title>
               <p>Additional data file 10</p>
            </title>
            <caption>
               <p>Rank correlations between <it>K</it><sub><it>A </it></sub>and the proportion of sequence near the exon-intron boundary</p>
            </caption>
            <text>
               <p>Rank correlations between <it>K</it><sub><it>A </it></sub>and the proportion of sequence near the exon-intron boundary.</p>
            </text>
            <file name="gb-2008-9-2-r29-S10.doc">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S11">
            <title>
               <p>Additional data file 11</p>
            </title>
            <caption>
               <p>Detailed characterization of specific local discontinuities</p>
            </caption>
            <text>
               <p>Detailed characterization of specific local discontinuities.</p>
            </text>
            <file name="gb-2008-9-2-r29-S11.doc">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S12">
            <title>
               <p>Additional data file 12</p>
            </title>
            <caption>
               <p>Sources of exon datasets</p>
            </caption>
            <text>
               <p>Sources of exon datasets.</p>
            </text>
            <file name="gb-2008-9-2-r29-S12.doc">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S13">
            <title>
               <p>Additional data file 13</p>
            </title>
            <caption>
               <p>Gene model IDs from which exons were derived</p>
            </caption>
            <text>
               <p>Gene model IDs from which exons were derived.</p>
            </text>
            <file name="gb-2008-9-2-r29-S13.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S14">
            <title>
               <p>Additional data file 14</p>
            </title>
            <caption>
               <p>Examples of locally discontinuous preference trends</p>
            </caption>
            <text>
               <p>Examples of locally discontinuous preference trends.</p>
            </text>
            <file name="gb-2008-9-2-r29-S14.doc">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S15">
            <title>
               <p>Additional data file 15</p>
            </title>
            <caption>
               <p>Local discontinuities across selected species</p>
            </caption>
            <text>
               <p>Local discontinuities across selected species.</p>
            </text>
            <file name="gb-2008-9-2-r29-S15.doc">
               <p>Click here for file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We would like to thank Max Robinson (University of Washington) for kindly providing us with results from his PhD thesis. We thank several reviewers for comments that greatly improved the manuscript. This work was funded by the Wellcome Trust (LDH), the Medical Research Council (TW) and the Biotechnology and Biological Sciences Research Council (JLP).</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Human coding and noncoding DNA: compositional correlations.</p>
            </title>
            <aug>
               <au>
                  <snm>Clay</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Cacci&#242;</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Zoubak</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Mouchiroud</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Bernardi</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Mol Phylogenet Evol</source>
            <pubdate>1996</pubdate>
            <volume>5</volume>
            <fpage>2</fpage>
            <lpage>12</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/mpev.1996.0002</pubid>
                  <pubid idtype="pmpid" link="fulltext">8673288</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Splicing and the evolution of proteins in mammals.</p>
            </title>
            <aug>
               <au>
                  <snm>Parmley</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Urrutia</snm>
                  <fnm>AO</fnm>
               </au>
               <au>
                  <snm>Potrzebowski</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Kaessmann</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2007</pubdate>
            <volume>5</volume>
            <fpage>e14</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1790955</pubid>
                  <pubid idtype="pmpid" link="fulltext">17298171</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0050014</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Exonic splicing enhancers: mechanism of action, diversity and role in human genetic diseases.</p>
            </title>
            <aug>
               <au>
                  <snm>Blencowe</snm>
                  <fnm>BJ</fnm>
               </au>
            </aug>
            <source>Trends Biochem Sci</source>
            <pubdate>2000</pubdate>
            <volume>25</volume>
            <fpage>106</fpage>
            <lpage>110</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0968-0004(00)01549-8</pubid>
                  <pubid idtype="pmpid" link="fulltext">10694877</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Regulation of alternative RNA splicing by exon definition and exon sequences in viral and mammalian gene expression.</p>
            </title>
            <aug>
               <au>
                  <snm>Zheng</snm>
                  <fnm>ZM</fnm>
               </au>
            </aug>
            <source>J Biomed Sci</source>
            <pubdate>2004</pubdate>
            <volume>11</volume>
            <fpage>278</fpage>
            <lpage>294</lpage>
            <note>A published erratum appears in <it>J Biomed Sci </it>2004, <b>11</b>:538.</note>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF02254432</pubid>
                  <pubid idtype="pmpid">15067211</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>SR proteins: a foot on the exon before the transition from intron to exon definition.</p>
            </title>
            <aug>
               <au>
                  <snm>Ram</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Ast</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <fpage>5</fpage>
            <lpage>7</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2006.10.002</pubid>
                  <pubid idtype="pmpid" link="fulltext">17070958</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Exon recognition in vertebrate splicing.</p>
            </title>
            <aug>
               <au>
                  <snm>Berget</snm>
                  <fnm>SM</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>1995</pubdate>
            <volume>270</volume>
            <fpage>2411</fpage>
            <lpage>2414</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">7852296</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Coevolution of genomic intron number and splice sites.</p>
            </title>
            <aug>
               <au>
                  <snm>Irimia</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Penny</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Roy</snm>
                  <fnm>SW</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <fpage>321</fpage>
            <lpage>325</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2007.04.001</pubid>
                  <pubid idtype="pmpid" link="fulltext">17442445</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>The function of multisite splicing enhancers.</p>
            </title>
            <aug>
               <au>
                  <snm>Hertel</snm>
                  <fnm>KJ</fnm>
               </au>
               <au>
                  <snm>Maniatis</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Mol Cell</source>
            <pubdate>1998</pubdate>
            <volume>1</volume>
            <fpage>449</fpage>
            <lpage>455</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1097-2765(00)80045-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">9660929</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>A systematic analysis of the factors that determine the strength of pre-mRNA splicing enhancers.</p>
            </title>
            <aug>
               <au>
                  <snm>Graveley</snm>
                  <fnm>BR</fnm>
               </au>
               <au>
                  <snm>Hertel</snm>
                  <fnm>KJ</fnm>
               </au>
               <au>
                  <snm>Maniatis</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>EMBO J</source>
            <pubdate>1998</pubdate>
            <volume>17</volume>
            <fpage>6747</fpage>
            <lpage>6756</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1171020</pubid>
                  <pubid idtype="pmpid" link="fulltext">9822617</pubid>
                  <pubid idtype="doi">10.1093/emboj/17.22.6747</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Evidence for codon bias selection at the pre-mRNA level in eukaryotes.</p>
            </title>
            <aug>
               <au>
                  <snm>Willie</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Majewski</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <fpage>534</fpage>
            <lpage>538</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2004.08.014</pubid>
                  <pubid idtype="pmpid" link="fulltext">15475111</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Biased codon usage near intron-exon junctions: selection on splicing enhancers, splice-site recognition or something else?</p>
            </title>
            <aug>
               <au>
                  <snm>Chamary</snm>
                  <fnm>JV</fnm>
               </au>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>256</fpage>
            <lpage>259</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2005.03.001</pubid>
                  <pubid idtype="pmpid" link="fulltext">15851058</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Exonic splicing regulatory elements skew synonymous codon usage near intron-exon boundaries in mammals.</p>
            </title>
            <aug>
               <au>
                  <snm>Parmley</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2007</pubdate>
            <volume>24</volume>
            <fpage>1600</fpage>
            <lpage>1603</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msm104</pubid>
                  <pubid idtype="pmpid" link="fulltext">17525472</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Systematic genome-wide annotation of spliceosomal proteins reveals differential gene family expansion.</p>
            </title>
            <aug>
               <au>
                  <snm>Barbosa-Morais</snm>
                  <fnm>NL</fnm>
               </au>
               <au>
                  <snm>Carmo-Fonseca</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Aparicio</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2006</pubdate>
            <volume>16</volume>
            <fpage>66</fpage>
            <lpage>77</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1356130</pubid>
                  <pubid idtype="pmpid" link="fulltext">16344558</pubid>
                  <pubid idtype="doi">10.1101/gr.3936206</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Sorting out the complexity of SR protein functions.</p>
            </title>
            <aug>
               <au>
                  <snm>Graveley</snm>
                  <fnm>BR</fnm>
               </au>
            </aug>
            <source>RNA</source>
            <pubdate>2000</pubdate>
            <volume>6</volume>
            <fpage>1197</fpage>
            <lpage>1211</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1369994</pubid>
                  <pubid idtype="pmpid" link="fulltext">10999598</pubid>
                  <pubid idtype="doi">10.1017/S1355838200000960</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>The architecture of pre-mRNAs affects mechanisms of splice-site pairing.</p>
            </title>
            <aug>
               <au>
                  <snm>Fox-Walsh</snm>
                  <fnm>KL</fnm>
               </au>
               <au>
                  <snm>Dou</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Lam</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Hung</snm>
                  <fnm>SP</fnm>
               </au>
               <au>
                  <snm>Baldi</snm>
                  <fnm>PF</fnm>
               </au>
               <au>
                  <snm>Hertel</snm>
                  <fnm>KJ</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2005</pubdate>
            <volume>102</volume>
            <fpage>16176</fpage>
            <lpage>16181</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1283478</pubid>
                  <pubid idtype="pmpid" link="fulltext">16260721</pubid>
                  <pubid idtype="doi">10.1073/pnas.0508489102</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Proceedings of the SMBE Tri-National Young Investigators' Workshop 2005. Investigating the intron recognition mechanism in eukaryotes.</p>
            </title>
            <aug>
               <au>
                  <snm>Collins</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Penny</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2006</pubdate>
            <volume>23</volume>
            <fpage>901</fpage>
            <lpage>910</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msj084</pubid>
                  <pubid idtype="pmpid" link="fulltext">16371412</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Gene expression, intron density, and splice site strength in <it>Drosophila </it>and <it>Caenorhabditis</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Fahey</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>2007</pubdate>
            <volume>65</volume>
            <fpage>349</fpage>
            <lpage>357</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s00239-007-9015-y</pubid>
                  <pubid idtype="pmpid" link="fulltext">17763878</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Splicing signals in <it>Drosophila</it>: intron size, information content, and consensus sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Mount</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Burks</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Hertz</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Stormo</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Fields</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1992</pubdate>
            <volume>20</volume>
            <fpage>4255</fpage>
            <lpage>4262</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">334133</pubid>
                  <pubid idtype="pmpid" link="fulltext">1508718</pubid>
                  <pubid idtype="doi">10.1093/nar/20.16.4255</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Information content of <it>Caenorhabditis elegans </it>splice site sequences varies with intron length.</p>
            </title>
            <aug>
               <au>
                  <snm>Fields</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1990</pubdate>
            <volume>18</volume>
            <fpage>1509</fpage>
            <lpage>1512</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">330518</pubid>
                  <pubid idtype="pmpid" link="fulltext">2326191</pubid>
                  <pubid idtype="doi">10.1093/nar/18.6.1509</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>The genome of the basidiomycetous yeast and human pathogen <it>Cryptococcus neoformans</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Loftus</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Fung</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Roncaglia</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Rowley</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Amedeo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bruno</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Vamathevan</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Miranda</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Anderson</snm>
                  <fnm>IJ</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Allen</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Bosdet</snm>
                  <fnm>IE</fnm>
               </au>
               <au>
                  <snm>Brent</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Chiu</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Doering</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Donlin</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>D'Souza</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Fox</snm>
                  <fnm>DS</fnm>
               </au>
               <au>
                  <snm>Grinberg</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Fu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Fukushima</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Haas</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Janbon</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Koo</snm>
                  <fnm>HL</fnm>
               </au>
               <au>
                  <snm>Krzywinski</snm>
                  <fnm>MI</fnm>
               </au>
               <au>
                  <snm>Kwon-Chung</snm>
                  <fnm>JK</fnm>
               </au>
               <au>
                  <snm>Lengeler</snm>
                  <fnm>KB</fnm>
               </au>
               <au>
                  <snm>Maiti</snm>
                  <fnm>R</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2005</pubdate>
            <volume>307</volume>
            <fpage>1321</fpage>
            <lpage>1324</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1103773</pubid>
                  <pubid idtype="pmpid" link="fulltext">15653466</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Common slope tests for bivariate errors-in-variables models.</p>
            </title>
            <aug>
               <au>
                  <snm>Warton</snm>
                  <fnm>DI</fnm>
               </au>
               <au>
                  <snm>Weber</snm>
                  <fnm>NC</fnm>
               </au>
            </aug>
            <source>Biom J</source>
            <pubdate>2002</pubdate>
            <volume>44</volume>
            <fpage>161</fpage>
            <lpage>174</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/1521-4036(200203)44:2&lt;161::AID-BIMJ161>3.0.CO;2-N</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Bivariate line-fitting methods for allometry.</p>
            </title>
            <aug>
               <au>
                  <snm>Warton</snm>
                  <fnm>DI</fnm>
               </au>
               <au>
                  <snm>Wright</snm>
                  <fnm>IJ</fnm>
               </au>
               <au>
                  <snm>Falster</snm>
                  <fnm>DS</fnm>
               </au>
               <au>
                  <snm>Westoby</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Biol Rev Camb Philos Soc</source>
            <pubdate>2006</pubdate>
            <volume>81</volume>
            <fpage>259</fpage>
            <lpage>291</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1017/S1464793106007007</pubid>
                  <pubid idtype="pmpid" link="fulltext">16573844</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Splicing signals in <it>Caenorhabditis elegans</it>: candidate exonic splicing enhancer motifs.</p>
            </title>
            <aug>
               <au>
                  <snm>Robinson</snm>
                  <fnm>RM</fnm>
               </au>
            </aug>
            <source>PhD thesis</source>
            <publisher>University of Washington</publisher>
            <pubdate>2005</pubdate>
         </bibl>
         <bibl id="B24">
            <title>
               <p>RESCUE-ESE Web Server</p>
            </title>
            <url>http://genes.mit.edu/burgelab/rescue-ese/</url>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Variation in sequence and organization of splicing regulatory elements in vertebrate genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Yeo</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Hoon</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Venkatesh</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Burge</snm>
                  <fnm>CB</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2004</pubdate>
            <volume>101</volume>
            <fpage>15700</fpage>
            <lpage>15705</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">524216</pubid>
                  <pubid idtype="pmpid" link="fulltext">15505203</pubid>
                  <pubid idtype="doi">10.1073/pnas.0404901101</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Evidence for purifying selection against synonymous mutations in mammalian exonic splicing enhancers.</p>
            </title>
            <aug>
               <au>
                  <snm>Parmley</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Chamary</snm>
                  <fnm>JV</fnm>
               </au>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2006</pubdate>
            <volume>23</volume>
            <fpage>301</fpage>
            <lpage>309</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msj035</pubid>
                  <pubid idtype="pmpid" link="fulltext">16221894</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Natural selection affects frequencies of AG and GT dinucleotides at the 5' and 3' ends of exons.</p>
            </title>
            <aug>
               <au>
                  <snm>Eskesen</snm>
                  <fnm>ST</fnm>
               </au>
               <au>
                  <snm>Eskesen</snm>
                  <fnm>FN</fnm>
               </au>
               <au>
                  <snm>Ruvinsky</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2004</pubdate>
            <volume>167</volume>
            <fpage>543</fpage>
            <lpage>550</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1470862</pubid>
                  <pubid idtype="pmpid" link="fulltext">15166176</pubid>
                  <pubid idtype="doi">10.1534/genetics.167.1.543</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>An analysis of intron positions in relation to nucleotides, amino acids, and protein secondary structure.</p>
            </title>
            <aug>
               <au>
                  <snm>Whamond</snm>
                  <fnm>GS</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2006</pubdate>
            <volume>359</volume>
            <fpage>238</fpage>
            <lpage>247</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2006.03.029</pubid>
                  <pubid idtype="pmpid" link="fulltext">16616935</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Alternative splicing: New insights from global analyses.</p>
            </title>
            <aug>
               <au>
                  <snm>Blencowe</snm>
                  <fnm>BJ</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2006</pubdate>
            <volume>126</volume>
            <fpage>37</fpage>
            <lpage>47</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cell.2006.06.023</pubid>
                  <pubid idtype="pmpid" link="fulltext">16839875</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Predictive identification of exonic splicing enhancers in human genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Fairbrother</snm>
                  <fnm>WG</fnm>
               </au>
               <au>
                  <snm>Yeh</snm>
                  <fnm>RF</fnm>
               </au>
               <au>
                  <snm>Sharp</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Burge</snm>
                  <fnm>CB</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2002</pubdate>
            <volume>297</volume>
            <fpage>1007</fpage>
            <lpage>1013</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1073774</pubid>
                  <pubid idtype="pmpid" link="fulltext">12114529</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>General and specific functions of exonic splicing silencers in splicing control.</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>ZF</fnm>
               </au>
               <au>
                  <snm>Xiao</snm>
                  <fnm>XS</fnm>
               </au>
               <au>
                  <snm>Van Nostrand</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Burge</snm>
                  <fnm>CB</fnm>
               </au>
            </aug>
            <source>Mol Cell</source>
            <pubdate>2006</pubdate>
            <volume>23</volume>
            <fpage>61</fpage>
            <lpage>70</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1839040</pubid>
                  <pubid idtype="pmpid" link="fulltext">16797197</pubid>
                  <pubid idtype="doi">10.1016/j.molcel.2006.05.018</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Conservation in budding yeast of a kinase specific for SR splicing factors.</p>
            </title>
            <aug>
               <au>
                  <snm>Siebel</snm>
                  <fnm>CW</fnm>
               </au>
               <au>
                  <snm>Feng</snm>
                  <fnm>LN</fnm>
               </au>
               <au>
                  <snm>Guthrie</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Fu</snm>
                  <fnm>XD</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1999</pubdate>
            <volume>96</volume>
            <fpage>5440</fpage>
            <lpage>5445</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">21878</pubid>
                  <pubid idtype="pmpid" link="fulltext">10318902</pubid>
                  <pubid idtype="doi">10.1073/pnas.96.10.5440</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>A handful of intron-containing genes produces the lion's share of yeast mRNA.</p>
            </title>
            <aug>
               <au>
                  <snm>Ares</snm>
                  <fnm>M</fnm>
                  <suf>Jr</suf>
               </au>
               <au>
                  <snm>Grate</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Pauling</snm>
                  <fnm>MH</fnm>
               </au>
            </aug>
            <source>RNA</source>
            <pubdate>1999</pubdate>
            <volume>5</volume>
            <fpage>1138</fpage>
            <lpage>1139</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1369836</pubid>
                  <pubid idtype="pmpid" link="fulltext">10496214</pubid>
                  <pubid idtype="doi">10.1017/S1355838299991379</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>The genome sequence of <it>Schizosaccharomyces pombe</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Wood</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Gwilliam</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Rajandream</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Lyne</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lyne</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Stewart</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sgouros</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Peat</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Hayles</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Baker</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Basham</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Bowman</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Brooks</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Chillingworth</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Churcher</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Collins</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Connor</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Cronin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Feltwell</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gentles</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Goble</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hamlin</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Harris</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Hidalgo</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hodgson</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Holroyd</snm>
                  <fnm>S</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>415</volume>
            <fpage>871</fpage>
            <lpage>880</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature724</pubid>
                  <pubid idtype="pmpid" link="fulltext">11859360</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Identification and characterization of srp1, a gene of fission yeast encoding a RNA binding domain and a RS domain typical of SR splicing factors.</p>
            </title>
            <aug>
               <au>
                  <snm>Gross</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Richert</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Mierke</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lutzelberger</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kaufer</snm>
                  <fnm>NF</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1998</pubdate>
            <volume>26</volume>
            <fpage>505</fpage>
            <lpage>511</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">147300</pubid>
                  <pubid idtype="pmpid" link="fulltext">9421507</pubid>
                  <pubid idtype="doi">10.1093/nar/26.2.505</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Srp2, an SR protein family member of fission yeast: <it>in vivo </it>characterization of its modular domains.</p>
            </title>
            <aug>
               <au>
                  <snm>L&#252;tzelberger</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Gross</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>K&#228;ufer</snm>
                  <fnm>NF</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1999</pubdate>
            <volume>27</volume>
            <fpage>2618</fpage>
            <lpage>2626</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">148469</pubid>
                  <pubid idtype="pmpid" link="fulltext">10373577</pubid>
                  <pubid idtype="doi">10.1093/nar/27.13.2618</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Pre-mRNA splicing in <it>Schizosaccharomyces pombe</it>: regulatory role of a kinase conserved from fission yeast to mammals.</p>
            </title>
            <aug>
               <au>
                  <snm>Kuhn</snm>
                  <fnm>AN</fnm>
               </au>
               <au>
                  <snm>K&#228;ufer</snm>
                  <fnm>NF</fnm>
               </au>
            </aug>
            <source>Curr Genet</source>
            <pubdate>2003</pubdate>
            <volume>42</volume>
            <fpage>241</fpage>
            <lpage>251</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">12589463</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Exonic splicing enhancers in fission yeast: functional conservation demonstrates an early evolutionary origin.</p>
            </title>
            <aug>
               <au>
                  <snm>Webb</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Romfo</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>van Heeckeren</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Wise</snm>
                  <fnm>JA</fnm>
               </au>
            </aug>
            <source>Genes Dev</source>
            <pubdate>2005</pubdate>
            <volume>19</volume>
            <fpage>242</fpage>
            <lpage>254</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">545887</pubid>
                  <pubid idtype="pmpid" link="fulltext">15625190</pubid>
                  <pubid idtype="doi">10.1101/gad.1265905</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Test of intron predictions reveals novel splice sites, alternatively spliced mRNAs and new introns in meiotically regulated genes of yeast.</p>
            </title>
            <aug>
               <au>
                  <snm>Davis</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Grate</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Spingola</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ares</snm>
                  <fnm>M</fnm>
                  <suf>Jr</suf>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <fpage>1700</fpage>
            <lpage>1706</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">102823</pubid>
                  <pubid idtype="pmpid" link="fulltext">10734188</pubid>
                  <pubid idtype="doi">10.1093/nar/28.8.1700</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>mRNAs encoding zinc finger protein isoforms are expressed by alternative splicing of an in-frame intron in fission yeast.</p>
            </title>
            <aug>
               <au>
                  <snm>Okazaki</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Niwa</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>DNA Res</source>
            <pubdate>2000</pubdate>
            <volume>7</volume>
            <fpage>27</fpage>
            <lpage>30</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/dnares/7.1.27</pubid>
                  <pubid idtype="pmpid" link="fulltext">10718196</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>How did alternative splicing evolve?</p>
            </title>
            <aug>
               <au>
                  <snm>Ast</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>773</fpage>
            <lpage>782</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg1451</pubid>
                  <pubid idtype="pmpid" link="fulltext">15510168</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Alternative splicing and RNA selection pressure - evolutionary consequences for eukaryotic genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Xing</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>499</fpage>
            <lpage>509</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg1896</pubid>
                  <pubid idtype="pmpid" link="fulltext">16770337</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>SR proteins are required for nematode trans-splicing <it>in vitro</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Sanford</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Bruzik</snm>
                  <fnm>JP</fnm>
               </au>
            </aug>
            <source>RNA</source>
            <pubdate>1999</pubdate>
            <volume>5</volume>
            <fpage>918</fpage>
            <lpage>928</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1369816</pubid>
                  <pubid idtype="pmpid" link="fulltext">10411135</pubid>
                  <pubid idtype="doi">10.1017/S1355838299990234</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Functional characterization of SR and SR-related genes in <it>Caenorhabditis elegans</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Longman</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Johnstone</snm>
                  <fnm>IL</fnm>
               </au>
               <au>
                  <snm>C&#225;ceres</snm>
                  <fnm>JF</fnm>
               </au>
            </aug>
            <source>EMBO J</source>
            <pubdate>2000</pubdate>
            <volume>19</volume>
            <fpage>1625</fpage>
            <lpage>1637</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">310231</pubid>
                  <pubid idtype="pmpid" link="fulltext">10747030</pubid>
                  <pubid idtype="doi">10.1093/emboj/19.7.1625</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>WormBook: Trans-splicing and operons</p>
            </title>
            <aug>
               <au>
                  <snm>Blumenthal</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <url>http://www.wormbook.org/chapters/www_transsplicingoperons/transsplicingoperons.html</url>
         </bibl>
         <bibl id="B46">
            <title>
               <p>SL trans-splicing: easy come or easy go?</p>
            </title>
            <aug>
               <au>
                  <snm>Hastings</snm>
                  <fnm>KE</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>240</fpage>
            <lpage>247</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2005.02.005</pubid>
                  <pubid idtype="pmpid" link="fulltext">15797620</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Comprehensive splice-site analysis using comparative genomics.</p>
            </title>
            <aug>
               <au>
                  <snm>Sheth</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Roca</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Hastings</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Roeder</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Krainer</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Sachidanandam</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>3955</fpage>
            <lpage>3967</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1557818</pubid>
                  <pubid idtype="pmpid" link="fulltext">16914448</pubid>
                  <pubid idtype="doi">10.1093/nar/gkl556</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Multiple roles for SR proteins in trans splicing.</p>
            </title>
            <aug>
               <au>
                  <snm>Furuyama</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bruzik</snm>
                  <fnm>JP</fnm>
               </au>
            </aug>
            <source>Mol Cell Biol</source>
            <pubdate>2002</pubdate>
            <volume>22</volume>
            <fpage>5337</fpage>
            <lpage>5346</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">133944</pubid>
                  <pubid idtype="pmpid" link="fulltext">12101229</pubid>
                  <pubid idtype="doi">10.1128/MCB.22.15.5337-5346.2002</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Intercistronic region required for polycistronic pre-mRNA processing in <it>Caenorhabditis elegans</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Huang</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Kuersten</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Deshpande</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Spieth</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>MacMorris</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Blumenthal</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Mol Cell Biol</source>
            <pubdate>2001</pubdate>
            <volume>21</volume>
            <fpage>1111</fpage>
            <lpage>1120</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">99565</pubid>
                  <pubid idtype="pmpid" link="fulltext">11158298</pubid>
                  <pubid idtype="doi">10.1128/MCB.21.4.1111-1120.2001</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Yeast Gene Order Browser</p>
            </title>
            <url>http://wolfe.gen.tcd.ie/ygob/</url>
         </bibl>
         <bibl id="B51">
            <title>
               <p>MUSCLE: multiple sequence alignment with high accuracy and high throughput.</p>
            </title>
            <aug>
               <au>
                  <snm>Edgar</snm>
                  <fnm>RC</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>1792</fpage>
            <lpage>1797</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">390337</pubid>
                  <pubid idtype="pmpid" link="fulltext">15034147</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh340</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>Inparanoid Dm-Dps Orthologues</p>
            </title>
            <url>http://inparanoid.sbc.su.se/download/current/sqltables/sqltable.flyDROPS.fa-modDROME.fa</url>
         </bibl>
         <bibl id="B53">
            <title>
               <p>UCSC Genome Browser: Table Browser</p>
            </title>
            <url>http://genome.ucsc.edu/cgi-bin/hgTables</url>
         </bibl>
         <bibl id="B54">
            <title>
               <p>Three distinct modes of intron dynamics in the evolution of eukaryotes.</p>
            </title>
            <aug>
               <au>
                  <snm>Carmel</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>YI</fnm>
               </au>
               <au>
                  <snm>Rogozin</snm>
                  <fnm>IB</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2007</pubdate>
            <volume>17</volume>
            <fpage>1034</fpage>
            <lpage>1044</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1899114</pubid>
                  <pubid idtype="pmpid" link="fulltext">17495008</pubid>
                  <pubid idtype="doi">10.1101/gr.6438607</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>The role of U5 snRNP in pre-mRNA splicing.</p>
            </title>
            <aug>
               <au>
                  <snm>Newman</snm>
                  <fnm>AJ</fnm>
               </au>
            </aug>
            <source>EMBO J</source>
            <pubdate>1997</pubdate>
            <volume>16</volume>
            <fpage>5797</fpage>
            <lpage>5800</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1170210</pubid>
                  <pubid idtype="pmpid" link="fulltext">9312037</pubid>
                  <pubid idtype="doi">10.1093/emboj/16.19.5797</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B56">
            <title>
               <p>Functional analysis of the U5 snRNA loop 1 in the second catalytic step of yeast pre-mRNA splicing.</p>
            </title>
            <aug>
               <au>
                  <snm>O'Keefe</snm>
                  <fnm>RT</fnm>
               </au>
               <au>
                  <snm>Newman</snm>
                  <fnm>AJ</fnm>
               </au>
            </aug>
            <source>EMBO J</source>
            <pubdate>1998</pubdate>
            <volume>17</volume>
            <fpage>565</fpage>
            <lpage>574</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1170406</pubid>
                  <pubid idtype="pmpid" link="fulltext">9430647</pubid>
                  <pubid idtype="doi">10.1093/emboj/17.2.565</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B57">
            <title>
               <p>U5 snRNA interacts with exon sequences at 5' and 3' splice sites.</p>
            </title>
            <aug>
               <au>
                  <snm>Newman</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Norman</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1992</pubdate>
            <volume>68</volume>
            <fpage>743</fpage>
            <lpage>754</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0092-8674(92)90149-7</pubid>
                  <pubid idtype="pmpid" link="fulltext">1739979</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B58">
            <title>
               <p>Rfam: Seed Alignment for U5</p>
            </title>
            <url>http://www.sanger.ac.uk/cgi-bin/Rfam/getalignment.pl?acc=RF00020&amp;type=seed&amp;format=link</url>
         </bibl>
         <bibl id="B59">
            <title>
               <p>Pfam: clans, web tools and services.</p>
            </title>
            <aug>
               <au>
                  <snm>Finn</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Mistry</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Schuster-B&#246;ckler</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Griffiths-Jones</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hollich</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Lassmann</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Moxon</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Marshall</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Khanna</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Sonnhammer</snm>
                  <fnm>EL</fnm>
               </au>
               <au>
                  <snm>Bateman</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <issue>Database issue</issue>
            <fpage>D247</fpage>
            <lpage>D251</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1347511</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381856</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj149</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
