<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2164-7-55</ui>
   <ji>1471-2164</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>A multispecies comparison of the metazoan 3'-processing downstream elements and the CstF-64 RNA recognition motif</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Salisbury</snm>
               <fnm>Jesse</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>jesse.salisbury@umit.maine.edu</email>
            </au>
            <au id="A2">
               <snm>Hutchison</snm>
               <mi>W</mi>
               <fnm>Keith</fnm>
               <insr iid="I1"/>
               <insr iid="I3"/>
               <email>keithh@maine.edu</email>
            </au>
            <au id="A3" ca="yes">
               <snm>Graber</snm>
               <mi>H</mi>
               <fnm>Joel</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>joel.graber@jax.org</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Functional Genomics Program, The University of Maine, Orono, Maine 04469, USA</p>
            </ins>
            <ins id="I2">
               <p>The Jackson Laboratory, 600 Main Street, Bar Harbor, Maine 04609, USA</p>
            </ins>
            <ins id="I3">
               <p>Department of Biochemistry, Microbiology and Molecular Biology, The University of Maine, Orono, ME 04469, USA</p>
            </ins>
         </insg>
         <source>BMC Genomics</source>
         <issn>1471-2164</issn>
         <pubdate>2006</pubdate>
         <volume>7</volume>
         <issue>1</issue>
         <fpage>55</fpage>
         <url>http://www.biomedcentral.com/1471-2164/7/55</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">16542450</pubid>
               <pubid idtype="doi">10.1186/1471-2164-7-55</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>10</day>
               <month>1</month>
               <year>2006</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>16</day>
               <month>3</month>
               <year>2006</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>16</day>
               <month>3</month>
               <year>2006</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2006</year>
         <collab>Salisbury et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>The Cleavage Stimulation Factor (CstF) is a required protein complex for eukaryotic mRNA 3'-processing. CstF interacts with 3'-processing downstream elements (DSEs) through its 64-kDa subunit, CstF-64; however, the exact nature of this interaction has remained unclear. We used EST-to-genome alignments to identify and extract large sets of putative 3'-processing sites for mRNA from ten metazoan species, including <it>Homo sapiens, Canis familiaris, Rattus norvegicus, Mus musculus, Gallus gallus, Danio rerio, Takifugu rubripes, Drosophila melanogaster, Anopheles gambiae</it>, and <it>Caenorhabditis elegans</it>. In order to further delineate the details of the mRNA-protein interaction, we obtained and multiply aligned CstF-64 protein sequences from the same species.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We characterized the sequence content and specific positioning of putative DSEs across the range of organisms studied. Our analysis characterized the downstream element (DSE) as two distinct parts &#8211; a proximal UG-rich element and a distal U-rich element. We find that while the U-rich element is largely conserved in all of the organisms studied, the UG-rich element is not. Multiple alignment of the CstF-64 RNA recognition motif revealed that, while it is highly conserved throughout metazoans, we can identify amino acid changes that correlate with observed variation in the sequence content and positioning of the DSEs.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Our analysis confirms the early reports of separate U- and UG-rich DSEs. The correlated variations in protein sequence and mRNA binding sequences provide novel insights into the interactions between the precursor mRNA and the 3'-processing machinery.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Cleavage and polyadenylation (3'-processing) are essential steps in eukaryotic mRNA formation that can effect transcript stability and function <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Processing of the 3'-end occurs on the nascent pre-mRNA as it is transcribed by RNA polymerase II <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. Selection of the 3'-processing site is directed by interactions between the polyadenylation machinery and <it>cis</it>-acting elements found both upstream and downstream of the 3'-processing site. The principle upstream <it>cis</it>-acting element is the highly conserved AAUAAA hexamer, which interacts with Cleavage and Polyadenylation Specificity Factor (CPSF) and is found in the majority of metazoan transcripts <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B3">3</abbr></abbrgrp>. Putative downstream elements (DSE) include the functional binding site(s) of the 64-kDa subunit of Cleavage Stimulation Factor (CstF) <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. Interactions between CPSF and CstF, as well as polyA polymerase (PAP) and Cleavage Factors I and II (CFI and CFII respectively) are minimal essential requirements for <it>in vitro </it>polyadenylation <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>.</p>
         <sec>
            <st>
               <p>The DSE &#8211; one or two parts?</p>
            </st>
            <p>Unlike the upstream AAUAAA signal, whose description has remained largely unchanged since its discovery in 1976 <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, the DSE has had several descriptions. The DSE was initially characterized by conserved sequence patterns downstream of the 3'-processing site, resulting in estimated consensus sequences of UUUUCACUGC <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, GUGUUG <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, and CAYUG <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. Two interesting early studies manipulated downstream sequences in test plasmids to produce a bipartite model of the DSE <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, consisting of a proximal UG-rich sequence and a distal U-rich element that act synergistically <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. Further characterization of the DSE by deletion or substitution assays revealed UGUGUUGGAA <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, YGUGUUYY <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, AGGUUUUUU <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> and UUUUU <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp> as elements actively involved with directing the polyadenylation event in specific transcripts and/or test systems. RNA binding assays indicated that CstF-64 interacts with UUUU with a spacing of 15&#8211;30 nucleotides downstream of the 3'-processing site <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>.</p>
            <p>The U-rich description was later challenged by SELEX binding assays performed on CstF-64 by two independent groups. Beyer <it>et al </it>used complete CstF complexes in cell extracts, and reported three distinct patterns: AUGCGUUCCUCGUCC, YGUGUYN<sub>0&#8211;4</sub>UUYAYUGYGU, and UUGYUN<sub>0&#8211;4</sub>AUUUACU(U/G)N<sub>0&#8211;2</sub>YCU <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. Takagaki and Manley used a recombinant form of CstF-64 that included only the RNA recognition motif (RRM) and found preferred binding to a sequence that included both GU-rich (G(U)<sub>2&#8211;4</sub>G) and U-rich ((GU)<sub>2&#8211;4</sub>) components <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>.</p>
            <p>Statistical analysis of the DSE from information obtained from genomic alignments of <it>D. melanogaster </it>ESTs implicated the hexamers UGUUUU, UGUGUU and UUUUUU as DSEs <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. Other studies involving genomic alignments of mammalian 3'-UTRs or ESTs reported only U-rich elements with no apparent consensus <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, a pentamer with at least 4 Us or 2GU/U <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, or the heptamer UGUGUGU <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. An NMR solution of the vertebrate CstF-64 RRM structure was used to demonstrate binding to either (GU)<sub>4 </sub>or (GU)<sub>4</sub>UG, with a preference for the latter <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>. Through the wide variety of studies published to date, no clear consensus for the DSE has been demonstrated. In fact, the authors of the computational studies cited above argued against the existence of a single consensus. Review articles typically refer to a single UG-/U-rich DSE, in spite of the early evidence for two independent elements <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>.</p>
            <p>The present study was initiated to expand our understanding of the 3'-processing regulatory DSE sequences through a statistical survey that covers large sets of sequences across a broad phylogenetic range of metazoans. In addition, we also obtained and aligned multiple CstF-64 protein sequences for these same organisms, with the goal of identifying correlated changes in protein and probable nucleic acid binding sequences.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Description of the datasets</p>
            </st>
            <p>We constructed a 3'-processing site sequence database (PACdb <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>) from 13,006,921 ESTs and 10 metazoan species including <it>Anopheles gambiae </it>(mosquito), <it>Caenorhabditis elegans </it>(nematode), <it>Canis familiaris </it>(dog), <it>Danio rerio </it>(zebrafish), <it>Drosophila melanogaster </it>(fruit fly), G<it>allus gallus </it>(chicken), <it>Homo sapiens </it>(human), <it>Mus musculus </it>(mouse), <it>Rattus norvegicus </it>(rat), and <it>Takifugu rubripes </it>(fugu). The numbers of non-redundant and high quality polyadenylation sites included in our analysis ranged from 902 in <it>D. melanogaster </it>to 10,060 in <it>H. sapiens </it>(Table <tblr tid="T1">1</tblr>). The quality of our training data was inferred from the presence of the well-documented canonical CPSF binding hexamer AAUAAA located up to 40 bases upstream of the processing site. The Positional Word Count (PWC, described in Methods) distribution for AAUAAA peaks at position -21 relative to the 3'-processing site in vertebrates and shifts to -23 in <it>A. gambiae</it>, -22 in <it>D. melanogaster </it>and -19 in <it>C. elegans </it>(Figure <figr fid="F1">1</figr>). Invertebrates demonstrated a reduced fidelity for the canonical hexamer compared to vertebrates, with the AAUAAA hexamer percentage falling markedly. The percentage of sequences with the AAUAAA hexamer is consistent with previous reports <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B25">25</abbr></abbrgrp> and ranged from 69.9% to 77.9% for the vertebrates and 51.0% to 61.8% for the invertebrates (Table <tblr tid="T1">1</tblr>). We also tested for the presence of the most common variant, AUUAAA, and finally for any single base subsitution variant of the canonical hexamer (Table <tblr tid="T1">1</tblr>). Since the measured frequencies are in good agreement with previous results, we believe that the bulk of our sequences represent <it>bona fide </it>3'-processing sites.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Upstream (relative to the 3'-processing site) usage of AAUAAA, AWUAAA (W = A or U) or Delta 1 (AAUAAA plus all single base substitution variations).</p>
               </caption>
               <tblbdy cols="9">
                  <r>
                     <c ca="center">
                        <p>
                           <b>Organism</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Min EST</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Sites</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>% AAUAAA</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Count</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>% AWUAAA</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Count</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>% Delta 1</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Count</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="9">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>
                              <it>H. sapiens</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>10060</p>
                     </c>
                     <c ca="center">
                        <p>69.9</p>
                     </c>
                     <c ca="center">
                        <p>7032</p>
                     </c>
                     <c ca="center">
                        <p>86.0</p>
                     </c>
                     <c ca="center">
                        <p>8654</p>
                     </c>
                     <c ca="center">
                        <p>97.5</p>
                     </c>
                     <c ca="center">
                        <p>9806</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>
                              <it>C. familiaris</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>980</p>
                     </c>
                     <c ca="center">
                        <p>70.9</p>
                     </c>
                     <c ca="center">
                        <p>694</p>
                     </c>
                     <c ca="center">
                        <p>86.5</p>
                     </c>
                     <c ca="center">
                        <p>847</p>
                     </c>
                     <c ca="center">
                        <p>95.6</p>
                     </c>
                     <c ca="center">
                        <p>936</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>
                              <it>R. norvegicus</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>9329</p>
                     </c>
                     <c ca="center">
                        <p>69.7</p>
                     </c>
                     <c ca="center">
                        <p>6501</p>
                     </c>
                     <c ca="center">
                        <p>85.3</p>
                     </c>
                     <c ca="center">
                        <p>7958</p>
                     </c>
                     <c ca="center">
                        <p>97.3</p>
                     </c>
                     <c ca="center">
                        <p>9074</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>
                              <it>M. musculus</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>8543</p>
                     </c>
                     <c ca="center">
                        <p>70.9</p>
                     </c>
                     <c ca="center">
                        <p>6053</p>
                     </c>
                     <c ca="center">
                        <p>86.0</p>
                     </c>
                     <c ca="center">
                        <p>7342</p>
                     </c>
                     <c ca="center">
                        <p>97.0</p>
                     </c>
                     <c ca="center">
                        <p>8288</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>
                              <it>G. gallus</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>3056</p>
                     </c>
                     <c ca="center">
                        <p>70.0</p>
                     </c>
                     <c ca="center">
                        <p>2140</p>
                     </c>
                     <c ca="center">
                        <p>85.2</p>
                     </c>
                     <c ca="center">
                        <p>2604</p>
                     </c>
                     <c ca="center">
                        <p>96.0</p>
                     </c>
                     <c ca="center">
                        <p>2932</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>
                              <it>T. rubripes</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>1427</p>
                     </c>
                     <c ca="center">
                        <p>69.9</p>
                     </c>
                     <c ca="center">
                        <p>997</p>
                     </c>
                     <c ca="center">
                        <p>88.3</p>
                     </c>
                     <c ca="center">
                        <p>1259</p>
                     </c>
                     <c ca="center">
                        <p>96.6</p>
                     </c>
                     <c ca="center">
                        <p>1377</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>
                              <it>D. rerio</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>2585</p>
                     </c>
                     <c ca="center">
                        <p>77.9</p>
                     </c>
                     <c ca="center">
                        <p>2012</p>
                     </c>
                     <c ca="center">
                        <p>91.1</p>
                     </c>
                     <c ca="center">
                        <p>2355</p>
                     </c>
                     <c ca="center">
                        <p>98.1</p>
                     </c>
                     <c ca="center">
                        <p>2535</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>
                              <it>A. gambiae</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>1693</p>
                     </c>
                     <c ca="center">
                        <p>61.8</p>
                     </c>
                     <c ca="center">
                        <p>1046</p>
                     </c>
                     <c ca="center">
                        <p>71.7</p>
                     </c>
                     <c ca="center">
                        <p>537</p>
                     </c>
                     <c ca="center">
                        <p>88.9</p>
                     </c>
                     <c ca="center">
                        <p>1505</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>
                              <it>D. melanogaster</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>902</p>
                     </c>
                     <c ca="center">
                        <p>61.7</p>
                     </c>
                     <c ca="center">
                        <p>556</p>
                     </c>
                     <c ca="center">
                        <p>73.8</p>
                     </c>
                     <c ca="center">
                        <p>665</p>
                     </c>
                     <c ca="center">
                        <p>97.0</p>
                     </c>
                     <c ca="center">
                        <p>874</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>
                              <it>C. elegans</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>1003</p>
                     </c>
                     <c ca="center">
                        <p>51.0</p>
                     </c>
                     <c ca="center">
                        <p>511</p>
                     </c>
                     <c ca="center">
                        <p>71.6</p>
                     </c>
                     <c ca="center">
                        <p>717</p>
                     </c>
                     <c ca="center">
                        <p>87.7</p>
                     </c>
                     <c ca="center">
                        <p>879</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Positioning of the AAUAAA hexamer in the region upstream of the 3'-processing site for ten metazoan species</p>
               </caption>
               <text>
                  <p>Positioning of the AAUAAA hexamer in the region upstream of the 3'-processing site for ten metazoan species.</p>
               </text>
               <graphic file="1471-2164-7-55-1"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Positioning patterns of tetramers in the DSE region</p>
            </st>
            <p>Our initial PWC analysis of the DSE region was based on tetramers. While tetramers cannot unambiguously define the functional elements, they efficiently indicate positioning trends, as shown below. PWC probabilities for groups of tetramers with distinct, non-uniform positioning patterns are displayed in Figure <figr fid="F2">2</figr>. We display a subset of the tetramers, grouping words in the separate panels based on similar positioning and sequence content. The focus of the PWC analysis on positioning makes it immediately clear that there are at least three distinct patterns apparent downstream of the 3'-processing site.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Combined positional frequencies of selected groups of tetramers downstream of the 3'-processing site for ten metazoan species</p>
               </caption>
               <text>
                  <p><b>Combined positional frequencies of selected groups of tetramers downstream of the 3'-processing site for ten metazoan species</b>. <b>A: </b>Proximal UG-rich element (including UGUG and GUGU), <b>B: </b>distal U-rich element (including all single base substitutions of UUUU), <b>C: </b>alternative proximal UG-rich element with G to C transversion (including UCUG, CUGU, UGUC, and GUCU) and <b>D: </b>G rich element (including GGGG, GGGA, GGAG, GAGG, and AGGG). In all panels, the vertical axis is the frequency of occurrence of any of the grouped tetramers at the position indicated along the horizontal axis.</p>
               </text>
               <graphic file="1471-2164-7-55-2"/>
            </fig>
            <p>The three patterns apparent in (Figure <figr fid="F2">2</figr>) can be approximately defined as (A) a UG-rich element positioned 5&#8211;10 nucleotides downstream of the 3'-processing sites (Figures <figr fid="F2">2A</figr> and <figr fid="F2">2C</figr>), (B) A U-rich element positioned 15&#8211;25 nucleotides downstream of the 3'-processing site (8&#8211;20 for <it>C. elegans</it>), and (C) a G-rich element positioned over 20 nucleotides downstream of the 3'-processing site. PWC results for all tetramers in the downstream region are available as a supplemental table <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. The most robust pattern, in terms of frequency of occurrence across all sequences and organisms is the U-rich element (represented in Figure <figr fid="F2">2B</figr> by UUUU and all single base substitution variants), which has a strong positional bias in all species with maximum frequencies at positions 15 to 25 nt downstream of the processing site for the vertebrates and 8 to 15 for the invertebrates (Figure <figr fid="F2">2B</figr>.) The maximum positioning of the U-rich tetramers in <it>D. melanogaster </it>and <it>A. gambiae </it>are at positions 15 and 14 nt respectively and correlates with the 5' shift of the AAUAAA hexamer also seen in these species(Figure <figr fid="F1">1</figr>). U-rich tetramers in <it>C. elegans </it>cover a broadened range between 5 to 20 nt downstream of the processing site.</p>
            <p>Examination of the data represented in Figures <figr fid="F2">2A</figr>, <figr fid="F2">2C</figr>, and <figr fid="F3">3</figr> reveals significant variation in the UG-rich element between different species. For example, a comparison of Figures <figr fid="F2">2A</figr> and <figr fid="F2">2C</figr> indicates that variants of the UG-rich element with a G to C transversion (<it>e.g</it>., UCUG) have the same positioning indicating an acceptable functional substitution in all vertebrates, but not invertebrates. (A recent computational study of human 3'-processing sites also identified the UCUG-like elements <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>.) <it>C. elegans </it>apparently does not have a UG-rich element based on the lack of significant positioning bias in either Figures <figr fid="F2">2A</figr> or <figr fid="F2">2C</figr>. It is also worth noting that both the positioning and the relative occurrence of the UG-rich element (without G to C transversion) changes in the arthropods compared to the vertebrates (Figures <figr fid="F2">2A</figr> and <figr fid="F3">3</figr>). The G-rich tetramers (represented by GGGG and all variants with a single G to A transition in Figures <figr fid="F2">2D</figr> and <figr fid="F3">3</figr>) appear to be a feature of only the amniote (represented here by mammals and <it>G. gallus</it>) 3'-processing sites.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Tetramer positioning patterns for specific classes of organism</p>
               </caption>
               <text>
                  <p><b>Tetramer positioning patterns for specific classes of organism</b>. The plots from Figure 2 were grouped and averaged together according to the organism groupings listed as plot titles. In all plots, the U-rich element is plotted on the secondary vertical axis to allow greater detail to be observed in the other elements.</p>
               </text>
               <graphic file="1471-2164-7-55-3"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Delineating the DSE sequence content</p>
            </st>
            <p>While a tetramer-based PWC analysis clearly shows the positioning dependency of various sequence words, as noted above, it does not unambiguously describe the functional elements that produce the observed word distributions. Detailed delineation of RNA regulatory motifs is non-trivial, however, as while RNA elements are often defined by both sequence content and positioning, the standard computational pattern detection tools <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> typically consider only sequence content, with little or no weight given to positioning. Pattern recognition algorithms (<it>e.g</it>., MEME or the Gibbs Sampler) typically identify motifs as the patterns that most significantly stand out from the background implied by the surrounding sequence. One exception to this is the Improbizer <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>, which models positioning according to a normal distribution.</p>
            <p>While this is an improvement, examination of Figures <figr fid="F2">2</figr> and <figr fid="F3">3</figr> indicates that a normal distribution will only roughly approximate the observed positioning. We analyzed the 80 nucleotides downstream of our putative 3'-processing sites with a number of tools, including the Gibbs Recursive Sampler <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>, MEME <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>, the Improbizer <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>, and a hexamer-based PWC analysis. Where necessary, we post-processed the results to include delineation of positioning distribution. We present the results of the Gibbs Sampler analysis here, whereas the other results are available in the online supplement <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>.</p>
            <p>The Gibbs Sampler operates probabilistically, and can produce variable results upon repeated restarts. In addition, the large size of a number of our data sets (e.g., human, rat, and mouse) necessitated the selection of a random subset of the training sequences in order for the program to run in a reasonable time. A representative sampling of the Gibbs Sampler results is shown in Figure <figr fid="F4">4</figr>, using Sequence Logos <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> to represent of the sequence content, and line plots to represent the positioning distribution. These results are presented with the caveat that we specifically selected results that most closely reproduce the positioning patterns observed in the PWC analysis. Results from at least ten independent runs of each data set (with parameters as described in Methods) are available in our online supplement <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Gibbs Recursive Sampler analysis of the DSE regions</p>
               </caption>
               <text>
                  <p><b>Gibbs Recursive Sampler analysis of the DSE regions</b>. Typical results are shown from Gibbs Recursive Sampler [30] analysis of the organism-specific downstream sequence sets. The sequence content and positioning distribution of the motifs are represented by Sequence Logos [32] and line plots, respectively. The analysis allowed patterns to grow up to 10 nt in length, however the resulting patterns were consistently hexamers or heptamers, as shown. Positioning distributions were extracted from the output file and displayed graphically with a custom perl script. Elements 1, 2, and 3 are represented in the line plot as red, green, and blue lines, respectively. Full parameter lists for the analysis are described in Methods.</p>
               </text>
               <graphic file="1471-2164-7-55-4"/>
            </fig>
            <p>The Gibbs Sampler routinely identified the UG-rich element in all species except <it>C. elegans </it>(Figure <figr fid="F4">4</figr>), with positioning distributions often consistent with the patterns identified for UG-rich sequences in the PWC tetramer analysis (Figure <figr fid="F2">2</figr>). However, in several cases (<it>e.g</it>., elements 1 and 2 for <it>D. melanogaster </it>in Figure <figr fid="F4">4</figr>) the positioning distribution appeared to be a mixture of both the UG- and U-rich elements, likely indicating an overly "greedy" pattern description that encompassed both elements. Characterization of the U-rich element proved more elusive. Since our sequences are, in general, very U-rich, we could only identify U-rich motifs through either the use of a prior specification or by reducing the weight of the input sequence set in determining the background model. With these adjustments, we were able to characterize U-rich elements, such as those shown as motif 2 for nearly all organisms in Figure <figr fid="F4">4</figr>.</p>
            <p>Consistent with the PWC tetramer analysis (Figure <figr fid="F2">2D</figr>), the Gibbs Sampler frequently identified far downstream G-rich motifs (<it>e.g</it>., motif 3 for <it>C. familiaris </it>in Figure <figr fid="F4">4</figr>), but only for amniotes. In contrast, the fish and arthropod far downstream regions produced an A-rich element, such as shown as motif 3 for <it>D. rerio, A. gambiae</it>, and <it>C. elegans</it>.</p>
         </sec>
         <sec>
            <st>
               <p>Determining the DSE motif length</p>
            </st>
            <p>The Gibbs Sampler can vary the motif size (as can the Improbizer and MEME), selecting the length that produces the most statistically significant result. The Gibbs Sampler and Improbizer consistently returned motifs between 4 and 7 nucleotides in length, whereas MEME typically identified motifs between 12 and 15 nucleotides. Examination of the MEME results (available in the supplement <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>) revealed that the extended motifs resembled a concatenation of the UG- and U-rich elements that was dominated by either a strong UG-rich component in the first half or strong U-rich component in the second half. In addition, we also tested the fragmentation option of the Gibbs Sampler (data not shown), which allows the detection of non-contiguous patterns, under the constraint that the positioning between blocks must be fixed (or nearly so). Nearly all runs of all sequence sets resulted in contiguous motifs.</p>
         </sec>
         <sec>
            <st>
               <p>Analysis of the CstF-64 RRM multiple alignment</p>
            </st>
            <p>The CstF-64 RRM (or RNA binding domain) follows the well-conserved fold structure found in many other RNA-binding proteins (reviewed in <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>). Residues of <it>&#946;</it>-strands one and three make up canonical motifs of RNP2 and RNP1 respectively, and are part of the larger RRM structure <it>&#946;</it><sub>1</sub><it>&#945;</it><sub>1</sub><it>&#946;</it><sub>2</sub><it>&#946;</it><sub>3</sub><it>&#945;</it><sub>2</sub><it>&#946;</it><sub>4 </sub>(Figure <figr fid="F5">5</figr>). The vertebrate CstF-64 RRM is terminated by an additional <it>&#945;</it>-helix (helix C) that lies across the <it>&#946;</it>-sheet, occluding the projected RNA binding site <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. The residues for the entire region spanning R<sub>7 </sub>through G<sub>105 </sub>(the end of helix C) are completely conserved in vertebrates, except for the single residue substitution of P<sub>41 </sub>&#8594; L<sub>41 </sub>in fish. This near perfect conservation does not extend to invertebrates where numerous residue substitutions can be found. Across the same span of residues, percent identities between <it>A. gambiae, D. melanogaster</it>, and <it>C. elegans </it>and the vertebrate sequence are 72.7%, 66.7%, and 56.6%, respectively. (In addition, <it>C. elegans </it>has two insertions of 1 and 2 amino acids, respectively.) The substitutions are not uniformly distributed. If we restrict our analysis to only the <it>&#946;</it>-sheet (highlighted in blue in Figure <figr fid="F5">5</figr>), the percent identities increase to 91.7%, 83.3%, and 79.1%, respectively. In contrast, the fifteen amino acids in helix C are much more variable, with percent identities of 33.3%, 33.3%, and 40%, respectively, and <it>C. elegans </it>also has a single amino acid insertion.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Multiple alignment of CstF-64 N-terminal RRM and helix C region</p>
               </caption>
               <text>
                  <p><b>Multiple alignment of CstF-64 N-terminal RRM and helix C region</b>. Colors: blue = <it>&#946;</it>-strands; red = <it>&#945;</it>-helix; green = hinge region (partial); yellow shaded = helix C stabilizing interactions [22]; boxed blue = residues most affected by RNA binding, according to NMR relaxation [23].</p>
               </text>
               <graphic file="1471-2164-7-55-5"/>
            </fig>
            <p>We further restricted our analysis to the amino acids previously identified as contributing to interactions between helix C and the <it>&#946;</it>-sheet (F<sub>19</sub>, F<sub>61</sub>, N<sub>91</sub>, N<sub>97</sub>, E<sub>100</sub>, L<sub>101</sub>, and L<sub>104 </sub><abbrgrp><abbr bid="B22">22</abbr></abbrgrp>) or between the <it>&#946;</it>-sheet and bound RNA (G<sub>21</sub>, S<sub>44</sub>, F<sub>45</sub>, R<sub>46</sub>, L<sub>47</sub>, D<sub>50</sub>, T<sub>53</sub>, K<sub>55</sub>, K<sub>57</sub>, Y<sub>59</sub>, F<sub>61</sub>, C<sub>62</sub>, E<sub>63</sub>, and D<sub>90 </sub><abbrgrp><abbr bid="B23">23</abbr></abbrgrp>). In Table <tblr tid="T2">2</tblr>, we list the subset for which changes are observed in the invertebrates.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Invertebrate amino acid changes for residues previously identified as critical in either helix C to <it>&#946;</it>-sheet [22] or RNA to <it>&#946;</it>-sheet [23] interactions.</p>
               </caption>
               <tblbdy cols="11">
                  <r>
                     <c ca="center">
                        <p>
                           <b>Organism</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>F</b>
                           <sub>45</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>R</b>
                           <sub>46</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>L</b>
                           <sub>47</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>T</b>
                           <sub>53</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Y</b>
                           <sub>59</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>C</b>
                           <sub>62</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>N</b>
                           <sub>91</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>N</b>
                           <sub>99</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>L</b>
                           <sub>101</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>L</b>
                           <sub>104</sub>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>
                              <it>A. gambiae</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>L</p>
                     </c>
                     <c ca="center">
                        <p>K</p>
                     </c>
                     <c ca="center">
                        <p>L</p>
                     </c>
                     <c ca="center">
                        <p>S</p>
                     </c>
                     <c ca="center">
                        <p>F</p>
                     </c>
                     <c ca="center">
                        <p>C</p>
                     </c>
                     <c ca="center">
                        <p>N</p>
                     </c>
                     <c ca="center">
                        <p>S</p>
                     </c>
                     <c ca="center">
                        <p>M</p>
                     </c>
                     <c ca="center">
                        <p>L</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>
                              <it>D. melanogaster</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>L</p>
                     </c>
                     <c ca="center">
                        <p>K</p>
                     </c>
                     <c ca="center">
                        <p>L</p>
                     </c>
                     <c ca="center">
                        <p>S</p>
                     </c>
                     <c ca="center">
                        <p>Y</p>
                     </c>
                     <c ca="center">
                        <p>C</p>
                     </c>
                     <c ca="center">
                        <p>N</p>
                     </c>
                     <c ca="center">
                        <p>S</p>
                     </c>
                     <c ca="center">
                        <p>M</p>
                     </c>
                     <c ca="center">
                        <p>L</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>
                              <it>C. elegans</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>I</p>
                     </c>
                     <c ca="center">
                        <p>K</p>
                     </c>
                     <c ca="center">
                        <p>M</p>
                     </c>
                     <c ca="center">
                        <p>T</p>
                     </c>
                     <c ca="center">
                        <p>Y</p>
                     </c>
                     <c ca="center">
                        <p>I</p>
                     </c>
                     <c ca="center">
                        <p>S</p>
                     </c>
                     <c ca="center">
                        <p>N</p>
                     </c>
                     <c ca="center">
                        <p>F</p>
                     </c>
                     <c ca="center">
                        <p>S</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>While the percent identity of helix C is higher in <it>C. elegans </it>than for either <it>A. gambiae </it>or <it>D. melanogaster</it>, the substitutions in <it>C. elegans </it>are arguably more significant. Chou-Fasman <it>&#945;</it>-helix and <it>&#946;</it>-sheet propensities indicate that the S<sub>94 </sub>&#8594; G<sub>94</sub>, E<sub>95 </sub>&#8594; G<sub>95 </sub>and K<sub>102 </sub>&#8594; G<sub>102 </sub>substitutions possibly prevent a stable helix C from forming in <it>C. elegans</it>. In addition, the vertebrate helix C includes three conserved lysines, all of which are oriented with their side chains pointing away from the <it>&#946;</it>-sheet in the absence of bound RNA <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. All of the lysines are replaced with non-charged residues in <it>C. elegans</it>, while <it>A. gambiae </it>and <it>D. melanogaster </it>have identical K<sub>96</sub>, a conservative K<sub>98 </sub>&#8594; R<sub>98 </sub>substitution, but a non-charged substitution at residue 102.</p>
         </sec>
         <sec>
            <st>
               <p>Extended multiple alignment</p>
            </st>
            <p>A multiple alignment of the complete sequences for the group of organisms analyzed is available as a supplement. The complete CstF-64 protein sequence consists of five distinct regions. The N-terminal (approximately 110 residues in vertebrates) CstF-64 RRM and helix C are highly conserved. In addition, the "hinge" region of CstF-64 (residues 110&#8211;210) which interacts with both CstF-77 and symplekin <abbrgrp><abbr bid="B34">34</abbr></abbrgrp> is also highly conserved. The hinge region is followed by a low-complexity Proline-Glycine rich region (residues 210&#8211;410). The 12 contiguous MEAR(A/G) repeats (residues 410&#8211;470) including the interspersed RGG motifs, are weakly conserved outside of amniotes, if present at all. The remaining C-terminal residues (514&#8211;577) are highly conserved, reportedly reflecting the interaction between CstF-64 and the transcriptional coactivator PC4 <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <sec>
            <st>
               <p>UG- and U-rich signals are distinct DSEs</p>
            </st>
            <p>Unlike the previous computational studies of the metazoan DSE cited above, our analysis explicitly includes characterization of the positioning biases relative to the 3'-processing site. While it is possible that the U/UG-rich DSEs comprise one motif, several aspects of our analysis lead us to the conclusion that our results are consistent with the presence of distinct UG- and U-rich elements as proposed by McDevitt <it>et al</it>. <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> and Gil and Proudfoot <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. (This notably excludes <it>C. elegans</it>, which has no evidence of a UG-rich component.) While the positioning distributions of the U- and UG-rich sequences have considerable overlap, it is clear from Figures <figr fid="F2">2</figr> and <figr fid="F3">3</figr> that they are distinct. Previous experimental studies have occasionally produced longer putative elements that included both UG- and U-rich portions (<it>e.g</it>., <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> and <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>), however, if the functional element was a single longer element, we would expect UG- and U-rich positioning distributions with a common shape, but offset in position. In contrast, we observe distinct distributions that are more consistent with two independent elements separated by variable spacing. (It is worth noting that nothing in our analysis precludes these elements overlapping in an individual sequence.) Finally, the typical separation that we observe between the vertebrate UG- and U-rich elements (approximately 15 nucleotides between the UG-rich and U-rich positioning peaks in Figures <figr fid="F2">2</figr> and <figr fid="F4">4</figr>) would imply a significantly longer RRM binding site than has been previously observed <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>.</p>
            <p>Our analysis indicates that the U-rich element is more prevalent than the UG-rich element in all species studied, and the sequence content and relative positioning of the U-rich element in all vertebrate species is consistent with previous <it>in vitro </it>polyadenylation assays <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> and CstF-64-pre-RNA UV cross linking studies <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>, as well as the recent NMR studies of CstF-64 <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>. Somewhat paradoxically, the sequence content of the UG-rich element in the vertebrates is consistent with the results of the CstF-64 SELEX binding experiments <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>. Implications of these differences are discussed below. We believe that the historical difficulties in clear delineation of these elements are likely due to a convergence of several mitigating factors, including the degenerate sequence content of <it>both </it>the UG- and U-rich elements, significant overlap in both the positioning (Figures <figr fid="F2">2</figr> and <figr fid="F4">4</figr>) and sequence content (Figure <figr fid="F4">4</figr>) of the two elements, and a typical emphasis on only sequence content in computational investigations.</p>
         </sec>
         <sec>
            <st>
               <p>An interaction model for the CstF-64 RRM with precursor mRNA</p>
            </st>
            <p>Our comparative studies of the variation in the primary protein sequence of the CstF-64 RRM have highlighted differences in potentially critical residues that correlate with changes in the apparent binding sites identified by our statistical analysis. These correlations put us in a position to speculate on the mechanism of interaction between the RRM of CstF-64 and the downstream region of the precursor mRNAs. We hypothesize that the data presented here, in conjunction with previous published work, supports a model in which the proximal UG-rich element is involved in the necessary displacement of helix C <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> that exposes the <it>&#946;</it>-sheet for binding to the distal U-rich element.</p>
         </sec>
         <sec>
            <st>
               <p>Evidence for an interaction between the <it>&#946;</it>-sheet and the U-rich element</p>
            </st>
            <p>UV-crosslinking of CstF-64 <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> revealed sequence content and positioning very similar to the U-rich element we describe in Figures <figr fid="F2">2</figr> and <figr fid="F4">4</figr>. The NMR structures of the vertebrate CstF-64 RRM indicate that the binding pocket targets the UU di-nucleotide and that the larger UG di-nucleotide is discriminated against based on size <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. Using specific oligomers, it was also shown that the (GU)<sub>4 </sub>sequence has a two-fold weaker interaction with the CstF than does the (GU)<sub>4</sub>UG sequence <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. U-rich DSEs are a ubiquitous pattern of all organisms studied here, including <it>C. elegans</it>. The beta strands that form the RNA-binding sheet are also nearly perfectly conserved. The observed changes (F<sub>45 </sub>&#8594; L<sub>45 </sub>in <it>D. melanogaster </it>and <it>A. gambiae</it>, F<sub>45 </sub>&#8594; I<sub>45 </sub>in <it>C. elegans</it>, C<sub>62 </sub>&#8594; I<sub>62 </sub>in <it>C. elegans</it>, A<sub>86 </sub>&#8594; T<sub>86 </sub>in <it>D. melanogaster</it>, A<sub>86 </sub>&#8594; I<sub>86 </sub>in <it>C. elegans</it>) are either conservative in side-chain substitution or oriented such that the side-chains face away from the binding pocket surface. In contrast, helix C displays considerable variation, as described below.</p>
         </sec>
         <sec>
            <st>
               <p>Evidence for an interaction between helix C and the UG-rich element</p>
            </st>
            <p>The NMR studies showed that in the absence of RNA, helix C is stably bound to residues in the RRM binding pocket, thereby occluding it <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. The residues responsible for this interaction are absolutely conserved in the vertebrate sequences we have examined. Significant changes found in the invertebrates likely disrupt or weaken the <it>&#946;</it>-sheet to helix C interaction. These differences correlate with changes in the statistical patterns we identified for DSEs. Specifically, in <it>C. elegans</it>, ten of the sixteen helix C residues are changed from the vertebrate consensus, and many of these differences are non-conservative (Table <tblr tid="T2">2</tblr>). This correlates with complete absence of the proximal UG-rich element. In <it>D. melanogaster </it>and <it>A. gambiae</it>, the change in apparent affinity (Figures <figr fid="F2">2</figr>, <figr fid="F3">3</figr>, and <figr fid="F4">4</figr>) is more subtle, <it>e.g</it>., the UCUG-like variant of the vertebrate UG-rich element is absent. Several significant changes in helix C residues can be correlated with this change, including an N<sub>97 </sub>&#8594; S<sub>97 </sub>substitution that could disrupt the hydrogen bonding to N<sub>91 </sub>predicted in the vertebrate structure. In addition, both <it>D. melanogaster </it>and <it>A. gambiae </it>are missing K<sub>104</sub>, which is conserved in vertebrates. According to the NMR structure <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>, the K<sub>104 </sub>side chain is directed away from the beta sheet, an orientation that would make possible interactions between the charged amino side-chain and the RNA backbone. The observed correlations in protein sequence and apparent RNA affinity imply that helix C plays an important role in defining the DSE region. The importance of helix C is consistent with a forthcoming study of the in vitro binding affinities of variant forms of recombinant CstF-64 (R. Monarez, C.C. MacDonald, <it>pers. comm</it>.).</p>
         </sec>
         <sec>
            <st>
               <p>What is the state of helix C during transcription?</p>
            </st>
            <p>The conformation of helix C during transcription is currently unknown, however it must be unwound or displaced prior to RNA binding <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. If helix C was already displaced during transcription, the RNA binding process would be a single event. Although a single step binding process cannot be ruled out at this time, we find it to be unlikely as it would fail to explain the presence of both proximal UG-rich and distal U-rich elements. We speculate that helix C is structurally intact while scanning the nascent RNA and that a preliminary interaction is required for displacement. As described above, differences in helix C primary sequence correlate with changes in the pattern, or even existence, of the proximal UG-rich element. In addition, assuming a 5'-to-3' processivity, the proximal positioning of the UG-rich element is consistent with a role in the displacement of helix C that exposes the beta sheet for binding to the more prevalent distal U-rich element. If the model we propose is accurate, and CstF-64 is responsible for interactions with both DSEs, it provides an explanation for the discrepancy between sequence preferences observed in SELEX <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp> and cross-linking studies. It remains an open question <it>why </it>SELEX measurements would favor the initial interaction with the UG-rich element.</p>
         </sec>
         <sec>
            <st>
               <p><it>C. elegans</it>, polycistronic transcripts and the proximal UG-rich element</p>
            </st>
            <p>The distinct changes in both the CstF-64 RRM sequence and apparent binding affinity in <it>C. elegans </it>are not surprising, given the known differences in 3'-processing. Approximately 15% of the genes in <it>C. elegans </it>are expressed in polycistronic transcripts <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>, which are processed into monocistronic transcripts in a reaction that includes both 3'-processing and trans-splicing of a leader RNA to the downstream portion of the precursor <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. Despite the peculiar nature of these transcripts, they are not sufficient by themselves to explain the absence of the proximal UG-rich element. The loss of the proximal UG-rich element is a transcriptome wide change and therefore likely reflects the RNA binding properties of the <it>C. elegans </it>CstF-64.</p>
         </sec>
         <sec>
            <st>
               <p>The MEAR(A/G) repeats are not critical for UG- or U-rich DSE interactions</p>
            </st>
            <p>Previous reports speculated on a role for the MEAR(A/G) repeats in the CstF-64 pre-mRNA interaction <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. Our analyses counter-indicate direct involvement of the MEAR(A/G) repeats in recognition of either the UG- or U-rich DSEs, since the MEAR(A/G) repeats are greatly reduced or absent in fish species (see the supplement), but the UG- and U-rich patterns are essentially unchanged from other vertebrates (Figures <figr fid="F2">2</figr> and <figr fid="F3">3</figr>). In addition, SELEX studies that included only the N-terminal region of CstF-64 (approximately 130 residues) resulted in both UG- and U-rich binding patterns <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. Although the function of the MEAR(A/G) like repeats is currently unknown, the striking reduction in fish and the invertebrates correlates with the loss of the G-rich signal in the far (greater than 20 nt) downstream region (Figure <figr fid="F2">2D</figr>). G-rich elements have been implicated as auxiliary 3'-processing elements, interacting with heteronuclear RNP complexes <abbrgrp><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr></abbrgrp>, acting as transcriptional pause sites <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>, or forming a secondary structure based on the presence of G-quadruplexes <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>In our analysis of 39,578 high-quality 3'-processing sites spanning 10 genomes, we present the U/UG-rich DSE as two parts: a proximal UG-rich element with approximate positioning 5 to 10 nt downstream of the processing site and a distal U-rich element 15 to 25 nt from the processing site. Our results indicate that historical difficulties in classifying these elements are likely a consequence of their similarity in both sequence content and positioning. The distinct nature and positioning of the DSEs leads us to consider a model where the CstF-64 RRM interacts with both the UG- and U-rich elements separately and sequentially. Specifically, we hypothesize that the proximal UG-rich element contributes to the displacement of helix C, which exposes the RRM beta-sheet for subsequent binding to the distal U-rich element. While this model is speculative, it is consistent with both our results and previous studies.</p>
         <p>Confirmation of this model through site directed mutagenesis or other techniques may lead to a better understanding of how the DSE region directs cleavage site choice and ultimately its role in alternate polyadenylation.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Extraction of polyadenylation sites</p>
            </st>
            <p>Previously we constructed a 3'-processing site sequence database (PACdb <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>) from 13,006,921 ESTs and 10 species (<it>A. gambiae, C. elegans, C. familiaris, D. rerio, D. melanogaster, G. gallus, H. sapiens, M. musculus, R. norvegicus</it>, and <it>T. rubripes</it>. The number of EST sequences available for each species ranged between 25,850 for <it>T. rubripes </it>and 6,002,331 for <it>H. sapiens </it>(see supplement). ESTs that mapped to a genomic location were scored by our discriminant function for polyadenylation evidence (described below). The number of total EST alignments that passed our discriminant thresholds were variable and ranged from 2,301 in <it>C. elegans </it>to 514,894 in <it>H. sapiens</it>. The polyadenylation sites implied by these ESTs were further grouped into unique genomic locations (&#177; 25 nt) to account for sloppy polyadenylation and/or sequence data. Condensing the EST data in this manner reduced the number of implied 3'-processing sites to a range extending from a low of 1,003 in <it>C. elegans </it>to a high of 55,828 in <it>H. sapiens</it>. The average number of supporting ESTs at each unique genomic 3'-processing varied significantly between different organisms, introducing a bias that, if uncorrected, significantly increases the statistical weighting of rare sites in the more heavily sampled transcriptomes, <it>e.g., M. musculus </it>and <it>H. sapiens</it>. To correct for this bias, we used the mean number of supporting ESTs per unique 3'-processing site in each species as a minimum EST threshold. The final number of polyadenylation sites included in our analysis after applying redundancy thresholds ranged from 902 in <it>D. melanogaster </it>to 10,060 in <it>H. sapiens </it>(Table <tblr tid="T1">1</tblr>).</p>
         </sec>
         <sec>
            <st>
               <p>PolyA discriminant function</p>
            </st>
            <p>Characterization of degenerate regulatory sequences is critically dependent on a high quality training set, therefore we developed a discriminant function that selects putative 3'-processing sites with both strong evidence of polyA tails on the EST, and absence of genomic A-rich regions that could signal a mispriming of the polyT primer used to generate cDNA clones. Internal priming is often addressed by setting a cutoff for the number of adenosine bases allowable in the genomic sequence flanking the putative 3'-processing site; this is a good estimate of primer stability, however, it does not take into account positional effects of mismatches. Since cDNA generation involves an enzyme binding/initiation step, mismatches to the 3' end of the polyT primer are critical and should be included in the scoring function. We modeled the combined effect of primer thermostability and reverse transcriptase processivity by incorporating an exponential weighting (according to position) into our function. We extracted 20 nucleotides downstream of all putative 3'-processing sites, from both the EST and genomic sequence. EST sequence was scored for likelihood of being a true 3'-end, whereas genomic sequence was scored for likelihood of false priming. Each sequence position (<it>x</it>) relative to the 3' processing site was scored independently (position score, eq. 1).</p>
            <p>
               <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-7-55-i1">
                  <m:semantics>
                     <m:mrow>
                        <m:mtext>Position&#160;score&#160;</m:mtext>
                        <m:mo stretchy="false">(</m:mo>
                        <m:mi>x</m:mi>
                        <m:mo stretchy="false">)</m:mo>
                        <m:mo>=</m:mo>
                        <m:mn>1</m:mn>
                        <m:mo>&#8722;</m:mo>
                        <m:mstyle displaystyle="true">
                           <m:munderover>
                              <m:mo>&#8721;</m:mo>
                              <m:mn>1</m:mn>
                              <m:mi>x</m:mi>
                           </m:munderover>
                           <m:mrow>
                              <m:mfrac>
                                 <m:mn>1</m:mn>
                                 <m:mi>&#946;</m:mi>
                              </m:mfrac>
                              <m:msup>
                                 <m:mi>e</m:mi>
                                 <m:mrow>
                                    <m:mo>&#8722;</m:mo>
                                    <m:mi>x</m:mi>
                                    <m:mo>/</m:mo>
                                    <m:mi>&#946;</m:mi>
                                 </m:mrow>
                              </m:msup>
                           </m:mrow>
                        </m:mstyle>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>1</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqqGqbaucqqGVbWBcqqGZbWCcqqGPbqAcqqG0baDcqqGPbqAcqqGVbWBcqqGUbGBcqqGGaaicqqGZbWCcqqGJbWycqqGVbWBcqqGYbGCcqqGLbqzcqqGGaaicqGGOaakcqWG4baEcqGGPaqkcqGH9aqpcqaIXaqmcqGHsisldaaeWbqaamaalaaabaGaeGymaedabaacciGae8NSdigaaiabdwgaLnaaCaaaleqabaGaeyOeI0IaemiEaGNaei4la8Iae8NSdigaaaqaaiabigdaXaqaaiabdIha4bqdcqGHris5aOGaaCzcaiaaxMaadaqadaqaaiabigdaXaGaayjkaiaawMcaaaaa@577C@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>The scale parameter <it>&#946; </it>was estimated to be 12 and reflects our estimate of the average number of A nucleotides involved in reverse transcriptase priming event. These position scores were summed to our total score if the base at position (<it>x</it>) = A:</p>
            <p>
               <m:math name="1471-2164-7-55-i2" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mtext>total&#160;score</m:mtext>
                        <m:mo>=</m:mo>
                        <m:mstyle displaystyle="true">
                           <m:munderover>
                              <m:mo>&#8721;</m:mo>
                              <m:mn>1</m:mn>
                              <m:mrow>
                                 <m:mo>&#8804;</m:mo>
                                 <m:mn>20</m:mn>
                              </m:mrow>
                           </m:munderover>
                           <m:mrow>
                              <m:mo>{</m:mo>
                              <m:mtext>position&#160;score&#160;</m:mtext>
                              <m:mo stretchy="false">(</m:mo>
                              <m:mi>x</m:mi>
                              <m:mo stretchy="false">)</m:mo>
                              <m:mo stretchy="false">[</m:mo>
                              <m:mi>I</m:mi>
                              <m:mo stretchy="false">(</m:mo>
                              <m:msub>
                                 <m:mi>S</m:mi>
                                 <m:mi>x</m:mi>
                              </m:msub>
                              <m:mo>=</m:mo>
                              <m:mi>A</m:mi>
                              <m:mo stretchy="false">)</m:mo>
                              <m:mo stretchy="false">]</m:mo>
                              <m:mo>}</m:mo>
                           </m:mrow>
                        </m:mstyle>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>2</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqqG0baDcqqGVbWBcqqG0baDcqqGHbqycqqGSbaBcqqGGaaicqqGZbWCcqqGJbWycqqGVbWBcqqGYbGCcqqGLbqzcqGH9aqpdaaeWbqaaiabcUha7jabbchaWjabb+gaVjabbohaZjabbMgaPjabbsha0jabbMgaPjabb+gaVjabb6gaUjabbccaGiabbohaZjabbogaJjabb+gaVjabbkhaYjabbwgaLjabbccaGiabcIcaOiabdIha4jabcMcaPiabcUfaBjabdMeajjabcIcaOiabdofatnaaBaaaleaacqWG4baEaeqaaOGaeyypa0JaemyqaeKaeiykaKIaeiyxa0LaeiyFa0haleaacqaIXaqmaeaacqGHKjYOcqaIYaGmcqaIWaama0GaeyyeIuoakiaaxMaacaWLjaWaaeWaaeaacqaIYaGmaiaawIcacaGLPaaaaaa@6AC1@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>where <it>I </it>is an indicator variable, equal to 1 if the argument is true, and 0 otherwise. Total score outputs of our discriminant function ranged from 0 (no As) to 9.7718 (all As). Thresholds used for high quality 3'-processing sites (see supplement) required a minimum total score of 5.5 for polyadenylation (EST sequence) and a maximum score of 4.5 (genomic sequence) for internal priming.</p>
         </sec>
         <sec>
            <st>
               <p>Positional Word Count Analysis</p>
            </st>
            <p>Previous studies have shown that mRNA regulatory sequences can be characterized not only by sequence content, but also by relative positioning to a functional site, such as the 3'-processing site <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B27">27</abbr></abbrgrp>. Our principal method is positional word counting (PWC), in which all sequence words of a given length (tetramers and hexamers in this work) are counted, recording both the occurrence and the position with respect to the 3'-processing site. When normalized, PWC results in a frequency for each <it>k</it>-mer at each position and can be interpreted as a probability of occurrence, conditional on the occurrence of a 3'-processing site at position 0. Putative functional sequences are identified as <it>k</it>-mers with statistically significant non-uniform positioning with respect to the 3'-processing site. We interpret different <it>k</it>-mers with similar positioning as evidence of acceptable substitutions in the functional element. Positional word frequencies (<it>pwf</it>) were calculated as the fraction of sequences with word (<it>w</it>) at position (<it>i</it>) (eq. 3).</p>
            <p>
               <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-7-55-i3">
                  <m:semantics>
                     <m:mrow>
                        <m:mtable>
                           <m:mtr>
                              <m:mtd>
                                 <m:mrow>
                                    <m:mi>p</m:mi>
                                    <m:mi>w</m:mi>
                                    <m:msub>
                                       <m:mi>f</m:mi>
                                       <m:mi>i</m:mi>
                                    </m:msub>
                                    <m:mo>=</m:mo>
                                    <m:mfrac>
                                       <m:mrow>
                                          <m:mstyle displaystyle="true">
                                             <m:mo>&#8721;</m:mo>
                                             <m:mrow>
                                                <m:msub>
                                                   <m:mi>w</m:mi>
                                                   <m:mi>i</m:mi>
                                                </m:msub>
                                             </m:mrow>
                                          </m:mstyle>
                                       </m:mrow>
                                       <m:mi>n</m:mi>
                                    </m:mfrac>
                                 </m:mrow>
                              </m:mtd>
                           </m:mtr>
                           <m:mtr>
                              <m:mtd>
                                 <m:mrow>
                                    <m:mtext>where&#160;</m:mtext>
                                    <m:msub>
                                       <m:mi>w</m:mi>
                                       <m:mi>i</m:mi>
                                    </m:msub>
                                    <m:mo>=</m:mo>
                                    <m:mtext>word&#160;at&#160;position&#160;</m:mtext>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                              </m:mtd>
                           </m:mtr>
                           <m:mtr>
                              <m:mtd>
                                 <m:mrow>
                                    <m:mtext>and&#160;</m:mtext>
                                    <m:mi>n</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mtext>number&#160;of&#160;sequences</m:mtext>
                                 </m:mrow>
                              </m:mtd>
                           </m:mtr>
                        </m:mtable>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>3</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqabeWabaaabaGaemiCaaNaem4DaCNaemOzay2aaSbaaSqaaiabdMgaPbqabaGccqGH9aqpdaWcaaqaamaaqaeabaGaem4DaC3aaSbaaSqaaiabdMgaPbqabaaabeqab0GaeyyeIuoaaOqaaiabd6gaUbaaaeaacqqG3bWDcqqGObaAcqqGLbqzcqqGYbGCcqqGLbqzcqqGGaaicqWG3bWDdaWgaaWcbaGaemyAaKgabeaakiabg2da9iabbEha3jabb+gaVjabbkhaYjabbsgaKjabbccaGiabbggaHjabbsha0jabbccaGiabbchaWjabb+gaVjabbohaZjabbMgaPjabbsha0jabbMgaPjabb+gaVjabb6gaUjabbccaGiabdMgaPbqaaiabbggaHjabb6gaUjabbsgaKjabbccaGiabd6gaUjabg2da9iabb6gaUjabbwha1jabb2gaTjabbkgaIjabbwgaLjabbkhaYjabbccaGiabb+gaVjabbAgaMjabbccaGiabbohaZjabbwgaLjabbghaXjabbwha1jabbwgaLjabb6gaUjabbogaJjabbwgaLjabbohaZbaacaWLjaGaaCzcamaabmaabaGaeG4mamdacaGLOaGaayzkaaaaaa@80A0@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>The selection of the length of the <it>k</it>-mers to analyze involves a trade-off between the statistical power gained by the large numbers that can be counted for short words and the more complete motif description that can be obtained from longer words. The size of our data sets ranges from a few hundred to a few thousand sequences, making tetramers a reasonable size choice.</p>
         </sec>
         <sec>
            <st>
               <p>Motif finding</p>
            </st>
            <p>The DSE region spanning 80 nt downstream of the cleavage site was examined by the Gibbs Recursive Sampler <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>, MEME <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> and Improbizer <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. From each species, 500 3'-processing site sequences were randomly selected without replacement. As a position independent control, 500 sequences were generated from a species specific trained 0<sup><it>th </it></sup>order model and run in parallel. Several prelimiary runs were performed for each program to define optimal settings. At least 10 independent production runs were performed for each dataset. The Gibbs Recursive Sampler extracted 3 variable length motifs using command line options "-E 3 -W 0 -F -t -n -r -i 200 OS 200 -d 1,5,10,2,5,10,3,5,10". Motifs described in the "optimal" output section were used in order to maximize the number of example motifs tabulated. The MEME program was run a beowulf cluster using options "-dna -mod oops -nmotifs 3 -text -p52 -maxsize 1000000". Improbizer runs used options "numMotifs = 3 background = l maxOcc = l" and for additional control runs the "controlRun = on" parameter was set. Motif sequence information was gathered from all three programs via perl script and used to make sequence logo <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> images with the WebLogo script <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. Custom perl scripts were written to collect and graph positioning from the Gibbs Recursive Sampler, Improbizer, and MEME output files.</p>
         </sec>
         <sec>
            <st>
               <p>Multiple Alignment of CstF-64 proteins</p>
            </st>
            <p>Species specific CstF-64 protein sequences were downloaded from NCBI, UCSC and Ensembl where available. CstF-64 GenBank accessions used are as follows <it>H. sapiens </it>[GenBank:<ext-link ext-link-type="gen" ext-link-id="AAP88780.1">AAP88780.1</ext-link>], <it>M. musculus </it>[GenBank:<ext-link ext-link-type="gen" ext-link-id="NP_573459.1">NP_573459.1</ext-link>], <it>G. gallus </it>[GenBank:<ext-link ext-link-type="gen" ext-link-id="NP_001006433.1">NP_001006433.1</ext-link>], <it>D. rerio </it>[GenBank:<ext-link ext-link-type="gen" ext-link-id="AAH65442.1">AAH65442.1</ext-link>], <it>D. melanogaster </it>[GenBank:<ext-link ext-link-type="gen" ext-link-id="AAO45216.1">AAO45216.1</ext-link>], and <it>A. gambiae </it>[GenBank:<ext-link ext-link-type="gen" ext-link-id="EAA05544.1">EAA05544.1</ext-link>]. Additional CstF-64 sequences for <it>C. familiaris, R. norvegicus, T. rubripes </it>and <it>C. elegans </it>were generated via tblastn <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> of closely-related available sequences (<it>e.g., M. musculus </it>used as a query of <it>R. norvegicus </it>of genomic or EST sequences) followed by assembly with CAP3 <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>. Multiple alignment of CstF-64 was accomplished using ClustalX version 1.82 <abbrgrp><abbr bid="B46">46</abbr></abbrgrp> and displayed with UCSF Chimera sequence viewer <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>JS collected and pre-processed all data. JS performed the analysis with existing software packages. JS and JHG designed the novel statistical analysis and wrote the necessary software. JS, KWH and JHG analyzed and interpreted data and wrote the manuscript. JS and JHG created the supplemental web site.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The authors thank Carol Bult, Alexei Evsikov, Michael Brockman, and three anonymous reviewers for critical review of the manuscript. This work was partially supported by the NSF contracts No. DGE-0221625 and DBI-0331497, and NIH contracts NCRR INBRE Maine 2 P20 RR16463-04 and NICHD HD037102-07.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Formation of mRNA 3' Ends in Eukaryotes: Mechanism, Regulation, and Interrelationships with Other Steps in mRNA Synthesis</p>
            </title>
            <aug>
               <au>
                  <snm>Zhao</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hyrnan</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Moore</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Micro Mol Biol Rev</source>
            <pubdate>1999</pubdate>
            <volume>63</volume>
            <fpage>405</fpage>
            <lpage>445</lpage>
         </bibl>
         <bibl id="B2">
            <title>
               <p>The C-terminal domain of RNA polymerase II couples mRNA processing to transcription</p>
            </title>
            <aug>
               <au>
                  <snm>McCracken</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Fong</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Yankulov</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ballantyne</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Pan</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Greenblatt</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Patterson</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wickens</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bentley</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1997</pubdate>
            <volume>385</volume>
            <fpage>357</fpage>
            <lpage>361</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/385357a0</pubid>
                  <pubid idtype="pmpid" link="fulltext">9002523</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Mechanism and regulation of mRNA polyadenylation</p>
            </title>
            <aug>
               <au>
                  <snm>Colgan</snm>
                  <fnm>DF</fnm>
               </au>
               <au>
                  <snm>Manley</snm>
                  <fnm>JL</fnm>
               </au>
            </aug>
            <source>Genes and Development</source>
            <pubdate>1997</pubdate>
            <volume>11</volume>
            <fpage>2755</fpage>
            <lpage>2766</lpage>
         </bibl>
         <bibl id="B4">
            <title>
               <p>The 64-Kilodalton Subunit of the CstF Polyadenylation Factor Binds to Pre-mRNAs Downstream of the Clevage Site and Influences Cleavage Site Location</p>
            </title>
            <aug>
               <au>
                  <snm>MacDonald</snm>
                  <fnm>CC</fnm>
               </au>
               <au>
                  <snm>Wilusz</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>T</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Mol Cell Biol</source>
            <pubdate>1994</pubdate>
            <volume>14</volume>
            <fpage>6647</fpage>
            <lpage>6654</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">359194</pubid>
                  <pubid idtype="pmpid">7935383</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>3' Non-coding region sequences in eukaryotic messenger RNA</p>
            </title>
            <aug>
               <au>
                  <snm>Proudfoot</snm>
                  <fnm>NJ</fnm>
               </au>
               <au>
                  <snm>Brownlee</snm>
                  <fnm>GG</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1976</pubdate>
            <volume>263</volume>
            <fpage>211</fpage>
            <lpage>314</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/263211a0</pubid>
                  <pubid idtype="pmpid">822353</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>The ovalbumin gene &#8211; sequence of putative control regions</p>
            </title>
            <aug>
               <au>
                  <snm>Benoist</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>O'Hare</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Breathnach</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Chambon</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1980</pubdate>
            <volume>8</volume>
            <fpage>127</fpage>
            <lpage>142</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">327247</pubid>
                  <pubid idtype="pmpid">6243777</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Clonning and structure of the human immune interferon-<it>&#947; </it>chromosomal gene</p>
            </title>
            <aug>
               <au>
                  <snm>Taya</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Devos</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Tavernier</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Cheroutre</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Engler</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Fiers</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>EMBO</source>
            <pubdate>1982</pubdate>
            <volume>8</volume>
            <fpage>953</fpage>
            <lpage>958</lpage>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Are U4 small nuclear ribonucleoproteins involved in polyadenylation?</p>
            </title>
            <aug>
               <au>
                  <snm>Berget</snm>
                  <fnm>SM</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1984</pubdate>
            <volume>309</volume>
            <fpage>179</fpage>
            <lpage>182</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/309179a0</pubid>
                  <pubid idtype="pmpid">6325940</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Sequences capable of restoring poly(A) site function define two distinct downstream elements</p>
            </title>
            <aug>
               <au>
                  <snm>McDevitt</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Hart</snm>
                  <fnm>RP</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>WW</fnm>
               </au>
               <au>
                  <snm>Nevins</snm>
                  <fnm>JR</fnm>
               </au>
            </aug>
            <source>EMBO J</source>
            <pubdate>1986</pubdate>
            <volume>5</volume>
            <fpage>2907</fpage>
            <lpage>2913</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1167241</pubid>
                  <pubid idtype="pmpid">3024967</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Position-Dependent Sequence Elements Downstream of AAUAAA Are Required for Efficient Rabbit b-Globin mNRA 3 End Formation</p>
            </title>
            <aug>
               <au>
                  <snm>Gil</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Proudfoot</snm>
                  <fnm>NJ</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1987</pubdate>
            <volume>49</volume>
            <fpage>399</fpage>
            <lpage>406</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0092-8674(87)90292-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">3568131</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>A sequence downstream of AAUAAA is required for rabbit <it>&#946;</it>-globin mRNA 3'-end formation</p>
            </title>
            <aug>
               <au>
                  <snm>Gil</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Proudfoot</snm>
                  <fnm>NJ</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1984</pubdate>
            <volume>312</volume>
            <fpage>373</fpage>
            <lpage>374</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1038/312473a0</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>The consensus sequence YGTGTTYY located downstream from the AATAAA signal is required for efficient formation of mRNA 3 termini</p>
            </title>
            <aug>
               <au>
                  <snm>McLaughlan</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gaffney</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Whitton</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>B</snm>
                  <fnm>CJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>1985</pubdate>
            <volume>13</volume>
            <fpage>1347</fpage>
            <lpage>1368</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">341077</pubid>
                  <pubid idtype="pmpid">2987822</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Identification of a Sequence Element on the 3' Side of AAUAAA Which is Necessary for Simiam Virus 40 Late mRNA 3 End Processing</p>
            </title>
            <aug>
               <au>
                  <snm>Sadofsky</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Connelly</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Manley</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Alwine</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>Mol Cell Biol</source>
            <pubdate>1985</pubdate>
            <volume>5</volume>
            <fpage>2713</fpage>
            <lpage>2719</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">367009</pubid>
                  <pubid idtype="pmpid">3016512</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>A Uridylate Tract Mediates Effecient Herterogenous Nuclear Ribonucleoprotein C Protein-RNA Cross-Linking and Functionally Substitutes for the Downstream Element of the Polyadenylation Signal</p>
            </title>
            <aug>
               <au>
                  <snm>Wilusz</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Shenk</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Mol Cell Biol</source>
            <pubdate>1990</pubdate>
            <volume>10</volume>
            <fpage>6397</fpage>
            <lpage>6407</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">362916</pubid>
                  <pubid idtype="pmpid">1701018</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Sequences and position requirements for uridylate-rich downstream elements of polyadenylation signals</p>
            </title>
            <aug>
               <au>
                  <snm>Chou</snm>
                  <fnm>ZF</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>J</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Nucl Acids Res</source>
            <pubdate>1994</pubdate>
            <volume>22</volume>
            <fpage>2525</fpage>
            <lpage>2531</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308205</pubid>
                  <pubid idtype="pmpid">7518915</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>RNA Ligands Selected by Cleavage Stimulation Factor Contain Distinct Sequence Motifs That Function as Downstream Elements in 3-End Processing of Pre-mRNA</p>
            </title>
            <aug>
               <au>
                  <snm>Beyer</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Dandekar</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Keller</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>1997</pubdate>
            <volume>272</volume>
            <fpage>26769</fpage>
            <lpage>26779</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.272.42.26769</pubid>
                  <pubid idtype="pmpid" link="fulltext">9334264</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>RNA Recognition by the Human Polyadenylation Factor CstF</p>
            </title>
            <aug>
               <au>
                  <snm>Takagaki</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Manley</snm>
                  <fnm>JL</fnm>
               </au>
            </aug>
            <source>Mol Cell Biol</source>
            <pubdate>1997</pubdate>
            <volume>17</volume>
            <fpage>3907</fpage>
            <lpage>3914</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">232243</pubid>
                  <pubid idtype="pmpid" link="fulltext">9199325</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>In silico detection of control signals: mRNA 3'-end-processing sequences in diverse species</p>
            </title>
            <aug>
               <au>
                  <snm>Graber</snm>
                  <fnm>JH</fnm>
               </au>
               <au>
                  <snm>Cantor</snm>
                  <fnm>CR</fnm>
               </au>
               <au>
                  <snm>Mohr</snm>
                  <fnm>SC</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>TF</fnm>
               </au>
            </aug>
            <source>PNAS</source>
            <pubdate>1999</pubdate>
            <volume>96</volume>
            <fpage>14055</fpage>
            <lpage>14060</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">24189</pubid>
                  <pubid idtype="pmpid" link="fulltext">10570197</pubid>
                  <pubid idtype="doi">10.1073/pnas.96.24.14055</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Sequence determinants in human polyadenylation site selection</p>
            </title>
            <aug>
               <au>
                  <snm>Legendre</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Gautheret</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <fpage>7</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">149351</pubid>
                  <pubid idtype="pmpid" link="fulltext">12600277</pubid>
                  <pubid idtype="doi">10.1186/1471-2164-4-7</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Downstream elements of mammalian pre-mRNA polyadenylation signals: primary, secondary and higher-order structures</p>
            </title>
            <aug>
               <au>
                  <snm>Zarudnaya</snm>
                  <fnm>MI</fnm>
               </au>
               <au>
                  <snm>Kolomiets</snm>
                  <fnm>IM</fnm>
               </au>
               <au>
                  <snm>Potyahaylo</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Hovorun</snm>
                  <fnm>DM</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>1375</fpage>
            <lpage>1386</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">149834</pubid>
                  <pubid idtype="pmpid" link="fulltext">12595544</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg241</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Nucleotide Frequency Variation Across Human Genes</p>
            </title>
            <aug>
               <au>
                  <snm>Louie</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Ott</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Majewski</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>2594</fpage>
            <lpage>2601</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">403801</pubid>
                  <pubid idtype="pmpid" link="fulltext">14613976</pubid>
                  <pubid idtype="doi">10.1101/gr.1317703</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Recognition of GU-rich polyadenylation regulatory elements by human CstF-64 protein</p>
            </title>
            <aug>
               <au>
                  <snm>Canadillas-Perez</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Varani</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>EMBO J</source>
            <pubdate>2003</pubdate>
            <volume>22</volume>
            <fpage>2821</fpage>
            <lpage>2830</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/emboj/cdg259</pubid>
                  <pubid idtype="pmpid" link="fulltext">12773396</pubid>
                  <pubid idtype="pmcid">156756</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Protein ans RNA Dynamics Play Key Roles in Determining the Specific Recognition of GU-rich Polyadenylation Regulatory Elements by Human CstF-64 Protein</p>
            </title>
            <aug>
               <au>
                  <snm>Deka</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Rajan</snm>
                  <fnm>PK</fnm>
               </au>
               <au>
                  <snm>Perez-Canadillas</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Varani</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2005</pubdate>
            <volume>347</volume>
            <fpage>719</fpage>
            <lpage>733</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2005.01.046</pubid>
                  <pubid idtype="pmpid" link="fulltext">15769465</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>PACdb: PolyA Cleavage Site and 3'-UTR Database</p>
            </title>
            <aug>
               <au>
                  <snm>Brockman</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Singh</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Quinlan</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Salisbury</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Graber</snm>
                  <fnm>JH</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>3691</fpage>
            <lpage>3693</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti589</pubid>
                  <pubid idtype="pmpid" link="fulltext">16030070</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Patterns of Variant Polyadenylation Signal Usage in Human Genes</p>
            </title>
            <aug>
               <au>
                  <snm>Beaudoing</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Freier</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wyatt</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Claverie</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Gautheret</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <fpage>1001</fpage>
            <lpage>1010</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">310884</pubid>
                  <pubid idtype="pmpid" link="fulltext">10899149</pubid>
                  <pubid idtype="doi">10.1101/gr.10.7.1001</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>DownStream Element analysis web supplement</p>
            </title>
            <url>http://harlequin.jax.org/dse/</url>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Bioinformatic identification of candidate cis-regulatory elements involved in human mRNA polyadenylation</p>
            </title>
            <aug>
               <au>
                  <snm>Hu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lutz</snm>
                  <fnm>CS</fnm>
               </au>
               <au>
                  <snm>Wilusz</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Tian</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>RNA</source>
            <pubdate>2005</pubdate>
            <volume>11</volume>
            <fpage>1485</fpage>
            <lpage>1493</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1261/rna.2107305</pubid>
                  <pubid idtype="pmpid" link="fulltext">16131587</pubid>
                  <pubid idtype="pmcid">1370832</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Assessing computational tools for the discovery of transcription factor binding sites</p>
            </title>
            <aug>
               <au>
                  <snm>Tompa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Bailey</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Church</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>DeMoor</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Eskin</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Favorov</snm>
                  <fnm>AV</fnm>
               </au>
               <au>
                  <snm>Frith</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Fu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Makeev</snm>
                  <fnm>VJ</fnm>
               </au>
               <au>
                  <snm>Mironov</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Nobel</snm>
                  <fnm>WS</fnm>
               </au>
               <au>
                  <snm>Pavesi</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Pesole</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Regnier</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Simonis</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Sinha</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Thijs</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Helden</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Vandenbogaert</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Weng</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Workman</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ye</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Nature Biotechnology</source>
            <pubdate>2005</pubdate>
            <volume>23</volume>
            <fpage>137</fpage>
            <lpage>144</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nbt1053</pubid>
                  <pubid idtype="pmpid" link="fulltext">15637633</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Environmentally Induced Foregut Remodeling by PHA-4/FoxA and DAF-12/NHR</p>
            </title>
            <aug>
               <au>
                  <snm>Ao</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Gaudet</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Muttumu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Mango</snm>
                  <fnm>SE</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>305</volume>
            <fpage>1743</fpage>
            <lpage>1746</lpage>
            <note>[Suppl 1:3-5]</note>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1102216</pubid>
                  <pubid idtype="pmpid" link="fulltext">15375261</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Gibbs Recursive Sampler: finding transcription factor binding sites</p>
            </title>
            <aug>
               <au>
                  <snm>Thompson</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Rouchka</snm>
                  <fnm>EC</fnm>
               </au>
               <au>
                  <snm>Lawrence</snm>
                  <fnm>CE</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <issue>13</issue>
            <fpage>3580</fpage>
            <lpage>3585</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">169014</pubid>
                  <pubid idtype="pmpid" link="fulltext">12824370</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg608</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Fitting a mixture model by expectation maximization to discover motifs in biopolymers</p>
            </title>
            <aug>
               <au>
                  <snm>Bailey</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Elkan</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology</source>
            <publisher>AAAI Press</publisher>
            <pubdate>1994</pubdate>
            <fpage>28</fpage>
            <lpage>36</lpage>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Sequence Logos: A New Way to Display Consensus Sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Schneider</snm>
                  <fnm>TD</fnm>
               </au>
               <au>
                  <snm>Stephens</snm>
                  <fnm>RM</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1990</pubdate>
            <volume>18</volume>
            <fpage>6097</fpage>
            <lpage>6100</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">332411</pubid>
                  <pubid idtype="pmpid">2172928</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>The RNA recognition motif, a plastic RNA-binding platform to regulate post-transcriptional gene expression</p>
            </title>
            <aug>
               <au>
                  <snm>Maris</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Dominguez</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Allain</snm>
                  <fnm>FH</fnm>
               </au>
            </aug>
            <source>FEBS J</source>
            <pubdate>2005</pubdate>
            <volume>272</volume>
            <fpage>2118</fpage>
            <lpage>2131</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1742-4658.2005.04653.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">15853797</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Complex Protein Interactions within the Human Polyadenylation Machinery Identify a Novel Component</p>
            </title>
            <aug>
               <au>
                  <snm>Takagaki</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Manley</snm>
                  <fnm>JL</fnm>
               </au>
            </aug>
            <source>Mol Cell Biol</source>
            <pubdate>2000</pubdate>
            <volume>20</volume>
            <fpage>1515</fpage>
            <lpage>1525</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">85326</pubid>
                  <pubid idtype="pmpid" link="fulltext">10669729</pubid>
                  <pubid idtype="doi">10.1128/MCB.20.5.1515-1525.2000</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Evolutionary Conserved Interaction between CstF-64 and PC4 Links Transcription, Polyadenylation, and Termination</p>
            </title>
            <aug>
               <au>
                  <snm>Calvo</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Manley</snm>
                  <fnm>JL</fnm>
               </au>
            </aug>
            <source>Mol Cell</source>
            <pubdate>2001</pubdate>
            <volume>7</volume>
            <fpage>1013</fpage>
            <lpage>1023</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1097-2765(01)00236-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">11389848</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>A global analysis of Caenorhabditis elegans operons</p>
            </title>
            <aug>
               <au>
                  <snm>Blumenthal</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Evans</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Link</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Guffanti</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lawson</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Thierry-Mieg</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Thierry-Mieg</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Chiu</snm>
                  <fnm>WL</fnm>
               </au>
               <au>
                  <snm>Duke</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Kiraly</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>SK</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>417</volume>
            <fpage>851</fpage>
            <lpage>854</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature00831</pubid>
                  <pubid idtype="pmpid" link="fulltext">12075352</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Operons ineukaryotes</p>
            </title>
            <aug>
               <au>
                  <snm>Blumenthal</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Brief Funct Genomics Proteomic</source>
            <pubdate>2004</pubdate>
            <volume>3</volume>
            <issue>3</issue>
            <fpage>199</fpage>
            <lpage>211</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bfgp/3.3.199</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>An RNA-Binding Protein Specifically Interacts with a Functionally Important Domain of the Downstream Element of the Simian Virus 40 Late Polyadenylation Signal</p>
            </title>
            <aug>
               <au>
                  <snm>Qian</snm>
                  <fnm>ZW</fnm>
               </au>
               <au>
                  <snm>Wilusz</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Mol Cell Biol</source>
            <pubdate>1991</pubdate>
            <volume>11</volume>
            <fpage>5312</fpage>
            <lpage>5320</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">361594</pubid>
                  <pubid idtype="pmpid">1656229</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>The G-rich auxiliary downstream element has distince sequence and position requirements and mediates efficient 3' end pre-mRNA processing threough a trans-acting factor</p>
            </title>
            <aug>
               <au>
                  <snm>Bagga</snm>
                  <fnm>PS</fnm>
               </au>
               <au>
                  <snm>Ford</snm>
                  <fnm>LP</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>J</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1995</pubdate>
            <volume>23</volume>
            <fpage>1625</fpage>
            <lpage>1631</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">306907</pubid>
                  <pubid idtype="pmpid">7784220</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Auxiliary downstream elements are required for efficient polyadenylation of mammalian pre-mRNAs</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Wilusz</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1998</pubdate>
            <volume>26</volume>
            <fpage>2891</fpage>
            <lpage>2898</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/26.12.2891</pubid>
                  <pubid idtype="pmpid" link="fulltext">9611233</pubid>
                  <pubid idtype="pmcid">147640</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Downstream sequence elements with different affinities for the hnRNP H/H protein influence the processing efficiency of mammalian polyadenylation signals</p>
            </title>
            <aug>
               <au>
                  <snm>Arhin</snm>
                  <fnm>GK</fnm>
               </au>
               <au>
                  <snm>Boots</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bagga</snm>
                  <fnm>PS</fnm>
               </au>
               <au>
                  <snm>Milcarek</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wilusz</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>1842</fpage>
            <lpage>1850</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">113221</pubid>
                  <pubid idtype="pmpid" link="fulltext">11937639</pubid>
                  <pubid idtype="doi">10.1093/nar/30.8.1842</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Specific Transcriptional Pausing Activates Polyadenylation in a Coupled In Vitro System</p>
            </title>
            <aug>
               <au>
                  <snm>Yonaha</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Proudfoot</snm>
                  <fnm>NJ</fnm>
               </au>
            </aug>
            <source>Mol Cell</source>
            <pubdate>1999</pubdate>
            <volume>3</volume>
            <fpage>593</fpage>
            <lpage>600</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1097-2765(00)80352-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">10360175</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>WebLogorA sequence logo generator</p>
            </title>
            <aug>
               <au>
                  <snm>Crooks</snm>
                  <fnm>GE</fnm>
               </au>
               <au>
                  <snm>Hon</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Chandonia</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Brenner</snm>
                  <fnm>SE</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>1188</fpage>
            <lpage>1190</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">419797</pubid>
                  <pubid idtype="pmpid" link="fulltext">15173120</pubid>
                  <pubid idtype="doi">10.1101/gr.849004</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Basic Local Alignment Search Tool</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gish</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Meyers</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1990</pubdate>
            <volume>215</volume>
            <fpage>403</fpage>
            <lpage>410</lpage>
         </bibl>
         <bibl id="B45">
            <title>
               <p>CAPS: A DNA sequence assembly program</p>
            </title>
            <aug>
               <au>
                  <snm>Huang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Madan</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>1999</pubdate>
            <volume>9</volume>
            <fpage>868</fpage>
            <lpage>877</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">310812</pubid>
                  <pubid idtype="pmpid" link="fulltext">10508846</pubid>
                  <pubid idtype="doi">10.1101/gr.9.9.868</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Multiple sequence alignment with the Clustal series of programs</p>
            </title>
            <aug>
               <au>
                  <snm>Chenna</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Sugawara</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Koike</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Lopez</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Gibson</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Thompson</snm>
                  <fnm>JD</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>3497</fpage>
            <lpage>3500</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">168907</pubid>
                  <pubid idtype="pmpid" link="fulltext">12824352</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg500</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>UCSF Chimera &#8211; A Visualization System for Exploratory Research and Analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Pettersen</snm>
                  <fnm>EF</fnm>
               </au>
               <au>
                  <snm>Goddard</snm>
                  <fnm>TD</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>CC</fnm>
               </au>
               <au>
                  <snm>Couch</snm>
                  <fnm>GS</fnm>
               </au>
               <au>
                  <snm>Greenblatt</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Meng</snm>
                  <fnm>EC</fnm>
               </au>
               <au>
                  <snm>Ferrin</snm>
                  <fnm>TE</fnm>
               </au>
            </aug>
            <source>J Comput Chem</source>
            <pubdate>2004</pubdate>
            <volume>25</volume>
            <fpage>1605</fpage>
            <lpage>1612</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/jcc.20084</pubid>
                  <pubid idtype="pmpid" link="fulltext">15264254</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
