<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-8-43</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Predictive modeling of plant messenger RNA polyadenylation sites</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Ji</snm>
               <fnm>Guoli</fnm>
               <insr iid="I1"/>
               <email>glji@xmu.edu.cn</email>
            </au>
            <au id="A2">
               <snm>Zheng</snm>
               <fnm>Jianti</fnm>
               <insr iid="I1"/>
               <email>zhengjt803@126.com</email>
            </au>
            <au id="A3">
               <snm>Shen</snm>
               <fnm>Yingjia</fnm>
               <insr iid="I2"/>
               <email>Sheny@muohio.edu</email>
            </au>
            <au id="A4">
               <snm>Wu</snm>
               <fnm>Xiaohui</fnm>
               <insr iid="I1"/>
               <email>yerenye@163.com</email>
            </au>
            <au id="A5">
               <snm>Jiang</snm>
               <fnm>Ronghan</fnm>
               <insr iid="I1"/>
               <email>jronghan@126.com</email>
            </au>
            <au id="A6">
               <snm>Lin</snm>
               <fnm>Yun</fnm>
               <insr iid="I1"/>
               <email>hzwu@xmu.edu.cn</email>
            </au>
            <au id="A7">
               <snm>Loke</snm>
               <mi>C</mi>
               <fnm>Johnny</fnm>
               <insr iid="I2"/>
               <insr iid="I4"/>
               <email>johnny.loke@mssm.edu</email>
            </au>
            <au id="A8">
               <snm>Davis</snm>
               <mi>M</mi>
               <fnm>Kimberly</fnm>
               <insr iid="I2"/>
               <email>daviskz@umich.edu</email>
            </au>
            <au id="A9">
               <snm>Reese</snm>
               <mi>J</mi>
               <fnm>Greg</fnm>
               <insr iid="I3"/>
               <email>reesegj@muohio.edu</email>
            </au>
            <au id="A10" ca="yes">
               <snm>Li</snm>
               <mnm>Quinn</mnm>
               <fnm>Qingshun</fnm>
               <insr iid="I2"/>
               <email>liq@muohio.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Automation, Xiamen University, Xiamen, Fujian, 361005, P. R. China</p>
            </ins>
            <ins id="I2">
               <p>Department of Botany, Miami University, Oxford, OH 45056, USA</p>
            </ins>
            <ins id="I3">
               <p>Research Computing Group, IT Services, Miami University, Oxford, OH 45056, USA</p>
            </ins>
            <ins id="I4">
               <p>Current address: Department of Medicine, Division of Liver Diseases, Mount Sinai Medical Center, 1425 Madison Avenue, RM 1176, New York, NY 10029, USA</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>1</issue>
         <fpage>43</fpage>
         <url>http://www.biomedcentral.com/1471-2105/8/43</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17286857</pubid>
               <pubid idtype="doi">10.1186/1471-2105-8-43</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>12</day>
               <month>7</month>
               <year>2006</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>07</day>
               <month>2</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>07</day>
               <month>2</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Ji et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>One of the essential processing events during pre-mRNA maturation is the post-transcriptional addition of a polyadenine [poly(A)] tail. The 3'-end poly(A) track protects mRNA from unregulated degradation, and indicates the integrity of mRNA through recognition by mRNA export and translation machinery. The position of a poly(A) site is predetermined by signals in the pre-mRNA sequence that are recognized by a complex of polyadenylation factors. These signals are generally tri-part sequence patterns around the cleavage site that serves as the future poly(A) site. In plants, there is little sequence conservation among these signal elements, which makes it difficult to develop an accurate algorithm to predict the poly(A) site of a given gene. We attempted to solve this problem.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Based on our current working model and the profile of nucleotide sequence distribution of the poly(A) signals and around poly(A) sites in Arabidopsis, we have devised a Generalized Hidden Markov Model based algorithm to predict potential poly(A) sites. The high specificity and sensitivity of the algorithm were demonstrated by testing several datasets, and at the best combinations, both reach 97%. The accuracy of the program, called <it>p</it>oly(<it>A</it>) <it>s</it>ite <it>s</it>leuth or <it>PASS</it>, has been demonstrated by the prediction of many validated poly(A) sites. <it>PASS </it>also predicted the changes of poly(A) site efficiency in poly(A) signal mutants that were constructed and characterized by traditional genetic experiments. The efficacy of <it>PASS </it>was demonstrated by predicting poly(A) sites within long genomic sequences.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Based on the features of plant poly(A) signals, a computational model was built to effectively predict the poly(A) sites in Arabidopsis genes. The algorithm will be useful in gene annotation because a poly(A) site signifies the end of the transcript. This algorithm can also be used to predict alternative poly(A) sites in known genes, and will be useful in the design of transgenes for crop genetic engineering by predicting and eliminating undesirable poly(A) sites.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Eukaryotic messenger RNA (mRNA), after being transcribed from its coding gene, typically undergoes processing events, such as capping, splicing, and polyadenylation, before it is translocated to the cytoplasm and translated into proteins. While these three essential steps of processing are interrelated, each step is performed by a defined set of protein factors and uses specific signals encoded in the precursor mRNA (pre-mRNA) <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. The polyadenylation signals for all eukaryotes seem to have three common parts: a cleavage site (CS), a near upstream element (called NUE in plants, equivalent to AAUAAA in animals) about 20&#8211;30 nucleotides (nt) upstream of the CS, and an element about 50 nt upstream of the CS (termed far upstream element or FUE in plants) <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. In mammals, there is an additional signal element located ~20 nt downstream of the CS <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, which is not commonly observed in yeast and plants. Moreover, both yeast and plants possess much less sequence conservation in NUE and FUE regions compared to that of animals <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>. However, there is little conservation between yeast and plants in term of sequences of the poly(A) signal elements.</p>
         <p>Plant polyadenylation signals in general are more similar to yeast, in which no highly conserved signal sequences have been identified. For example, a recent work revealed that the NUE signal AAUAAA, albeit proven the best signal in plants <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, can only be found at the right position in about 10% of Arabidopsis genes <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. However, the same signal is used by over 50% of human genes <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. This makes it very difficult to predict the CS of plant genes without experimental evidence such as EST that can be used to deduce poly(A) sites. With many ongoing plant genome sequencing projects, using poly(A) sites as a determinant of the 3'-end of genes would greatly enhance the accuracy of genome annotation. To this end, we were interested in devising an algorithm to predict poly(A) sites using our newly developed nucleotide composition model of poly(A) signals in 3'-UTR of the genes in the model plant Arabidopsis <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>.</p>
         <p>In our improved plant polyadenylation signal model, there are three types of sequence elements that possess some level of conservation, FUE, NUE, and the newly defined cleavage element (CE) <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Within the CE, there are three sub-domains made up of different prevailing sequences: the highly conserved di-nucleotide (CA and UA) right before the CS; and two U-rich sequence elements on the right and the left sides of CS, termed CE-R and CE-L. Beyond signal sequence information, the clear transition of the nucleotide composition <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> also offers additional features for the design of the algorithm. Briefly, a high U/A ratio in the FUE decrease to a into a low value (high A/U ratio) in the NUE. Such a transition happens two more times between the NUE and CE, and within the CE. Finally, the U/A ratio becomes 1 beyond 50 nt downstream of the CS. During these U/A transitions, the G and C contents remain low except at the CS where a spike of C is evident <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Such a profile of the 3'-UTR in Arabidopsis has been confirmed independently <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. Other features of Arabidopsis polyadenylation signals are also found in that of the rice genome <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>.</p>
         <p>The Hidden Markov Model (HMM), a widely used system in bioinformatics, is a probability-based mathematical model with a complete set of theory, methods and an algorithm. It is widely used to describe both stability and variability of signals over background. Rabiner <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> systematically described the HMM and made it a common technology in voice recognition. In recent years, because of the similarity of biological data (DNA, RNA and protein sequences) to voice signals, the HMM has been used in different aspects of sequence analysis such as sequence comparison, prediction of protein structures and gene annotation. However, the length of the state in the original HMM is geometrically distributed, which limits its application. A new generation of the HMM, called the Generalized Hidden Markov Model (GHMM; <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp>), was introduced to extend the utility of the HMM. The GHMM gives each state multiple observed values (instead of the single value in the HMM), so it can easily be used in describing the organization of gene sequences. In this paper, we present a GHMM-based method for predicting the poly(A) sites in Arabidopsis. The prediction results of poly(A) sites are compared with experimentally validated data for some of the genes. Interestingly, our program can also predict the results of traditional mutation studies, as the site efficiencies and scores given by the program are linearly correlated.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>We were interested in using a computer program to predict the plant poly(A) site in a given transcript (for convenience, presented as a DNA sequence). To do so, we transformed the profiles of the known poly(A) sites and their adjacent region, based on the data presented by Loke et al. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> from Arabidopsis, into features that can be used for computational modeling. The analysis of a dataset of 8160 sequences (hereafter called 8K dataset) described in that paper provided the basis for setting parameters as described in the Methods section. Hence, we designed an algorithm and wrote a code named <it><ul>P</ul>oly(A) <ul>S</ul>ite <ul>S</ul>leuth </it>(or <it>PASS</it>) in PASCAL.</p>
         <sec>
            <st>
               <p>Sensitivity and specificity of the program</p>
            </st>
            <p>To evaluate the performance of <it>PASS</it>, we employed the two most common standards: sensitivity (<it>Sn</it>) and specificity (<it>Sp</it>). The definitions are:</p>
            <p>
               <m:math name="1471-2105-8-43-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mtable>
                           <m:mtr>
                              <m:mtd>
                                 <m:mrow>
                                    <m:mi>S</m:mi>
                                    <m:mi>n</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mfrac>
                                       <m:mrow>
                                          <m:mi>T</m:mi>
                                          <m:mi>P</m:mi>
                                       </m:mrow>
                                       <m:mrow>
                                          <m:mi>T</m:mi>
                                          <m:mi>P</m:mi>
                                          <m:mo>+</m:mo>
                                          <m:mi>F</m:mi>
                                          <m:mi>N</m:mi>
                                       </m:mrow>
                                    </m:mfrac>
                                 </m:mrow>
                              </m:mtd>
                              <m:mtd>
                                 <m:mrow>
                                    <m:mi>S</m:mi>
                                    <m:mi>p</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mfrac>
                                       <m:mrow>
                                          <m:mi>T</m:mi>
                                          <m:mi>P</m:mi>
                                       </m:mrow>
                                       <m:mrow>
                                          <m:mi>T</m:mi>
                                          <m:mi>P</m:mi>
                                          <m:mo>+</m:mo>
                                          <m:mi>F</m:mi>
                                          <m:mi>P</m:mi>
                                       </m:mrow>
                                    </m:mfrac>
                                    <m:mo>=</m:mo>
                                    <m:mn>1</m:mn>
                                    <m:mo>&#8722;</m:mo>
                                    <m:mfrac>
                                       <m:mrow>
                                          <m:mi>F</m:mi>
                                          <m:mi>P</m:mi>
                                       </m:mrow>
                                       <m:mrow>
                                          <m:mi>T</m:mi>
                                          <m:mi>P</m:mi>
                                          <m:mo>+</m:mo>
                                          <m:mi>F</m:mi>
                                          <m:mi>P</m:mi>
                                       </m:mrow>
                                    </m:mfrac>
                                 </m:mrow>
                              </m:mtd>
                           </m:mtr>
                        </m:mtable>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqabeqacaaabaGaem4uamLaemOBa4Maeyypa0ZaaSaaaeaacqWGubavcqWGqbauaeaacqWGubavcqWGqbaucqGHRaWkcqWGgbGrcqWGobGtaaaabaGaem4uamLaemiCaaNaeyypa0ZaaSaaaeaacqWGubavcqWGqbauaeaacqWGubavcqWGqbaucqGHRaWkcqWGgbGrcqWGqbauaaGaeyypa0JaeGymaeJaeyOeI0YaaSaaaeaacqWGgbGrcqWGqbauaeaacqWGubavcqWGqbaucqGHRaWkcqWGgbGrcqWGqbauaaaaaaaa@4E60@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>In these equations, <it>TP </it>(true positive) is the number of actual sites that are identified or predicted correctly. <it>FN </it>(false negative) is the number of actual sites that cannot be identified or predicted correctly. <it>FP </it>(false positive) is the number of false sites that are predicted by <it>PASS</it>. The value of <it>Sn </it>represents the fraction of the actual poly(A) sites that can be predicted, while <it>Sp </it>represents the fraction of actual poly(A) sites in all the predicted sites. The higher the <it>Sp </it>value is, the lower the fraction of false positive sites among predicted sites is.</p>
            <p>To evaluate the algorithm, we tested 568 known poly(A) sites (randomly chosen from the 8K dataset described in <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> and the Methods) to calculate <it>Sn</it>. Because not all poly(A) sites have been identified in each sequence of the database, we cannot calculate the real <it>Sp </it>value. Therefore, we used several negative control datasets for <it>Sp </it>calculations. These include Arabidopsis 5' UTRs, introns, coding sequences, and a randomly generated sequence dataset that preserve the trinucleotide distributions found in the 8K dataset. Since all the sites predicted by <it>PASS </it>in these control sequences are false sites, <it>FP </it>was set to be the number of sites that were predicted in these sequences. (<it>TP</it>+<it>FP</it>) was set to be the total number of sites. The results are shown in Figure <figr fid="F1">1A</figr>. The horizontal value represents the threshold, which is an user selectable standard in determining whether or not a nucleotide is a poly(A) site. If the value of a nucleotide is higher than the threshold, this position is thought to be a poly(A) site. <it>Sn</it>_0, <it>Sn</it>_3, and <it>Sn</it>_10 represent the distance between the predicted site and the known site, which are 0, 3 and 10 nucleotides, respectively. <it>Sn</it>_0 means that the predicted site is exactly the same as the known poly(A) site (0 distance). Based on Figure <figr fid="F1">1A</figr>, when the threshold is increased, <it>Sn </it>decreases while <it>Sp </it>increases. There is no drastic different when the <it>Sn </it>are calculated with the three positions relative to the poly(A) sites. However, <it>Sp </it>can be quite different when different control sequences are used. For the coding sequences and the randomly generated sequences, both <it>Sn </it>and <it>Sp </it>reach 97% at a threshold 4. For 5' UTR, <it>Sn </it>and <it>Sp </it>are about 82% at a threshold of 5.2. In the intron sequences, <it>Sn </it>and <it>Sp </it>are lower than others at 72% at a threshold of 6. The lower <it>Sp </it>may reflect the feature of the sequences of 5'UTR and the introns, because these sequences tend to have higher A and T content, a characteristic shared by 3' UTR on which <it>PASS </it>design was based.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Assessment of the algorithm and <it>PASS </it>program</p>
               </caption>
               <text>
                  <p>Assessment of the algorithm and <it>PASS </it>program. A. The relationship of sensitivity (<it>Sn</it>), specificity (<it>Sp</it>), and threshold. Threshold is a selectable standard in determining whether a poly(A) site is next to a nucleotide or not. It is also measured as a score for each nucleotide of an individual sequence. The higher the threshold, the better the probability that a nucleotide is a poly(A) site. <it>Sn</it>_0, <it>Sn</it>_3, and <it>Sn</it>_10 represent the distance between the prediction site and the validated site to be 0, 3, and 10 nucleotides, respectively. Random 8K, a randomly generated 8000 sequence dataset based on the 2<sup>nd </sup>order distribution of trinucleotide in the 8K dataset. Coding Seqs, 8000 coding sequences from Arabidopsis (downloaded from TAIR). Intron (8000 sequences) and 5'-UTR (974 sequences) datasets are also from Arabidopsis. B. The average prediction scores of the 8K dataset and other control datasets as in A. The authenticated poly(A) site at location 301 is as mark by a red triangle. C. Distribution of scores in the 8K dataset. The distribution of all other sites (except position 301) is presented as average scores of all these sites. The scores at the 301 position of each of the sequences were counted and their distribution of them is presented.</p>
               </text>
               <graphic file="1471-2105-8-43-1"/>
            </fig>
            <p>To evaluate the program, we tested the sequences in the above-mentioned datasets. Using the probability score, an output of <it>PASS</it>, we examined the distributions of the scores as shown in Figure <figr fid="F1">1B</figr>. The average scores of the 8K dataset peak at location 301, the authenticated poly(A) site position in these sequences. This is a demonstration of the efficacy of the program because it was designed to predict these positions as poly(A) sites. The average scores of the control sequences are much lower than that of 8K with the exception of the intron dataset. Again, this could be due the shared features between 3' UTR and the introns. Importantly, there is 1 point score difference between the average peak score of the poly(A) sites and that of the introns, which is significant enough to differentiate the poly(A) site from introns. This is demonstrated in the genomic sequence scan that is discussed later.</p>
            <p>To further examine the prediction scores that are distinctive for poly(A) sites, the distributions of the scores at position 301 and the average scores of all other non-poly(A) sites in all the sequences of 8K dataset were compared (Figure <figr fid="F1">1C</figr>). The majority of the poly(A) sites have a score between 6 and 7, whereas the average scores of all other non-poly(A) sites peak at 4&#8211;5. Difference of 1 to 2 score points again could be significant enough to resolve the poly(A) site from the background at the sequence level.</p>
         </sec>
         <sec>
            <st>
               <p>Predicting poly(A) sites by PASS</p>
            </st>
            <p>To demonstrate the efficacy of the algorithm and the software, we tested many sequences including those with multiple poly(A) sites. Three of the typical results are shown graphically in Figure <figr fid="F2">2</figr>. In general, most of the experimentally authenticated poly(A) sites are found in the high probability area with scores larger than or around 6. However, not all predicted sites with high scores are confirmed by EST data. There are a few possible reasons for this. First, the EST data may not be exhaustive, meaning that not all sites have been found in the available EST dataset. Second, not all possible sites are efficiently used in the cells. Instead, some sites may only be used under certain environmental or developmental conditions. Third, some may be inaccurately predicted. This could be corrected by further optimization of the algorithm. It is very interesting to note that in several cases, there are authenticated poly(A) sites located in the low score area (e.g. the first site in Figure <figr fid="F2">2A</figr>, with a score around 2). The reason for this is not clear. One possible explanation could be that the use of this site could be facilitated by yet unknown trans-acting factors.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Representative outputs of the software using sequences with multiple poly(A) sites</p>
               </caption>
               <text>
                  <p>Representative outputs of the software using sequences with multiple poly(A) sites. The triangles indicate the poly(A) sites confirmed by EST data. The majority of the real sites have relatively high probability (scores). However, in some cases (e.g., the first site in A) there are low prediction value sites. See text for more detail. Locations of the horizontal axis indicate the relative positions of the poly(A) sites in the sequence.</p>
               </text>
               <graphic file="1471-2105-8-43-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Identification of multiple poly(A) sites</p>
            </st>
            <p>To see if <it>PASS </it>can differentiate multiple poly(A) sites, we further tested some genes that have been reported by others or collected from GenBank collections (e.g. NCBI's Unigenes). Tobacco RNA binding protein-30 gene (Accession# X65118), which is known as a gene with many alternative poly(A) sites <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, was scanned for poly(A) site scores by <it>PASS</it>. As shown in Figure <figr fid="F3">3</figr>, most of the poly(A) sites are in the highly scored (around 5) area of the 3'-UTR with a couple of exceptions (&lt;4). However, the <it>PASS </it>predicted peaks at around location 280 were not validated. It is very likely that there are other factors contributing to the site selection, e.g. protein factors, RNA secondary structures, etc.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Comparison of <it>PASS </it>prediction and the validated poly(A) sites of a tobacco RNA binding protein-30 gene X65118</p>
               </caption>
               <text>
                  <p>Comparison of <it>PASS </it>prediction and the validated poly(A) sites of a tobacco RNA binding protein-30 gene X65118. The triangles indicate the authenticated poly(A) sites, and the number of the triangles at one position denote the number of times cDNAs associated with the specific poly(A) site were found [13].</p>
               </text>
               <graphic file="1471-2105-8-43-3"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Prediction of mutational alterations of poly(A) site efficiencies</p>
            </st>
            <p>One way to further assess the software would be to see if it can predict the change of the utility of poly(A) sites after the polyadenylation signals are mutated. Examples of this can be found in the well-studied 3'-UTRs. The polyadenylation signals for two genes, pea rubisco small subunit gene (<it>rbcS</it>) E-9 and cauliflower mosaic virus (CaMV) 35S transcript, have been extensively studied by classical mutagenesis and genetic means <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B14">14</abbr></abbrgrp>, and are being used widely in transgene constructions <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>. As shown on Figure <figr fid="F4">4A</figr>, the main poly(A) sites of the CaMV 3'UTR are located on the peak of the scores predicted by <it>PASS</it>. There are four validated poly(A) sites in <it>rbcS</it>, but sites 2 and 3 are the major poly(A) sites <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> (Figure <figr fid="F4">4B</figr>). Interestingly, our program predicted such site usage bias by 6&#8211;7 score points (compare to site 1). Again, similar to Fig. <figr fid="F3">3</figr>, there are a few peaks (meaning good poly(A) sites) after site 3 that are not used, presumably due to unknown factors that are not considered in this algorithm. It may also be possible that these sites are behind the major sites, and thus being skipped. Nonetheless, our predicted sites are typically in the near vicinity of the authenticated sites.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p><it>PASS </it>predicted scores of well-studied poly(A) signals and the relative efficiencies of polyadenylation signal mutants of <it>rbcS </it>determined by wet experiments</p>
               </caption>
               <text>
                  <p><it>PASS </it>predicted scores of well-studied poly(A) signals and the relative efficiencies of polyadenylation signal mutants of <it>rbcS </it>determined by wet experiments. The red triangles denote validated poly(A) sites in the wild type mRNA. A. <it>PASS </it>scan of CaMV 35S RNA 3'-UTR [14, 28], which is widely used as a polyadenylation signal for transgene expressions. B. Wild type 3'-UTR of <it>rbcS </it>profile scanned by <it>PASS</it>. The authenticated poly(A) sites are as marked. The predicted scores and the actual efficiencies of each site being used are tightly associated in which sites 2 and 3 are the major ones, while sites 1 and 4 are minor ones [7]. C. A set of representative poly(A) signal mutations of the site 1 of <it>rbcS </it>and their predicted scores by <it>PASS</it>. The dash line indicates the poly(A) site positions. D. Relationship between predicted score and poly(A) site 1 efficiencies of the mutants shown in C. The poly(A) site efficiency data were extracted from the results presented by Li and Hunt ([7]; Figures 2 to 4 therein).</p>
               </text>
               <graphic file="1471-2105-8-43-4"/>
            </fig>
            <p>Detailed conventional mutagenesis experiments were performed on the poly(A) signal for <it>rbcS </it>site 1, which was chosen to avoid the overlapping signals of sites 2 and 3 <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. Linker scanning, base substitution, and enhancing the signal by using AAUAAA all altered the site usage at different levels <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. Interestingly, these changes can also be predicted by our software as indicated in Figure <figr fid="F4">4C</figr>. This becomes evident when the <it>PASS </it>scores are compared with the site efficiency (the fraction of a poly(A) site being chosen and used in the pool of the <it>rbcS </it>mRNA) after mutation (Figure <figr fid="F4">4D</figr>). There is a tendency of linear relationship between <it>PASS </it>scores and site efficiencies following mutation. This suggests that our model can identify the poly(A) sites both qualitatively and may be also quantitatively.</p>
         </sec>
         <sec>
            <st>
               <p>Predicting poly(A) sites in the genomic sequences</p>
            </st>
            <p>One of the utilities of <it>PASS </it>is to predict poly(A) sites of unannotated genomic sequences, which could be helpful in genome annotation. This is because a poly(A) site marks the end of a 3'-UTR, which generally is the end of a gene. To test the effectiveness of <it>PASS </it>in this regard, we used it to scan several 50,000 nt genomic sequences downloaded from TAIR (<abbrgrp><abbr bid="B18">18</abbr></abbrgrp>;Arabidopsis Genome Initiative release 5). Figure <figr fid="F5">5</figr> shows one of these examples, in which many poly(A) sites were predicted. When the gene annotation data (gene units, from TAIR) were overlaid with the <it>PASS </it>prediction scores, several interesting phenomena became obvious. The ends of the 6 genes annotated in this region (from left to right orientation only, since <it>PASS </it>scans one direction from 5' to 3' of the sequence) all have the relative high score at the 3' termini of their transcripts, particularly when comparing the scores in the coding region and 3'-UTR (e.g. AT4G02510 and AT4G02540, Figure <figr fid="F5">5</figr>). Some of them show a few sites with good scores in the 3'UTR (AT4g02500) or even in the coding sequences (AT4G02750), which may reflect alternative poly(A) sites. More interestingly, however, the two regions with the highest scores (marked with "?") were not located in any annotated genes. This could be due to the traditional annotation process failing to recognize the genes. Alternatively, there may be some special sequences that possess the features of a poly(A) site. It is also possible that <it>PASS </it>produces false positive sites. These possibilities could be distinguished using wet lab experiments (RT-PCR approach with oligo-dT and a sequence specific primer to detect transcript with a poly(A) tail, for example).</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>A representative result of using <it>PASS </it>to scan a segment of the genomic sequence of Arabidopsis</p>
               </caption>
               <text>
                  <p>A representative result of using <it>PASS </it>to scan a segment of the genomic sequence of Arabidopsis. The top part of the image was downloaded (screen shot) from TAIR web page (Seqviewer) showing the gene annotation units (from chromosome 4, nucleotides 1,100,000 to 1,150,000). Each gene is label with an AGI locus ID. The lower part is the scores of this region by <it>PASS</it>. The double-headed arrows point to the relative location of the poly(A) sites and the peaks of <it>PASS </it>scores. The question marks indicate the regions of unknown gene annotation (see text for detail discussion on this). Note that the gene units on the reverse orientation are not predicted because <it>PASS </it>only predicts the sense direction as in mRNA.</p>
               </text>
               <graphic file="1471-2105-8-43-5"/>
            </fig>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>Based on the current model of Arabidopsis poly(A) signals and their features, we developed a GHMM-based algorithm that for the first time can predict poly(A) sites in plant mRNA. In this paper, the structure of the model is described, and the program was tested with known poly(A) sites. Using this model, we achieved sufficient sensitivity and specificity both at 97% in the coding sequence and random datasets at a threshold of 4. For other control datasets like 5' UTRs and introns, which are known to share some features with 3'UTRs, the <it>Sn </it>and <it>Sp </it>are still in a range of 72&#8211;82% at thresholds between 5.2 and 6. Moreover, the algorithm was able to predict many poly(A) site regions accurately when scanning a big fragment of Arabidopsis genomic sequences.</p>
         <p>GHMM is an important model in gene identification and is widely used by gene identification software such as GENSCAN, GeneMarkS and HMMgene <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>. GHMM can give each state multiple observed values (instead of a single value to each state in HMM) which makes it more suitable for describing a model of biological sequences. This improvement, however, is at the expense of an increase in computation. For example, the calculation complexity of the Viterbi algorithm, a traditional HMM algorithm, is <it>O(N</it><sup>2</sup><it>L)</it>, in which N is the number of states and L is the length of a sequence, while the calculation complexity of GHMM is <it>O(N</it><sup>2</sup><it>L</it><sup>3</sup><it>/2)</it>. Such an improvement resulted in better sensitivity.</p>
         <p>Graber et al. <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> described a HMM model that predicts the poly(A) sites in yeast. While the basic principles of HMM are used in modeling algorithms in which the parameters were designed rather than trained, the difference and improvement using GHMM can be found in our algorithm. Each of the two models deals with a distinct group of organisms both of which have different types of poly(A) signal conservation, from which different parameters have to be given. Our model was produced based on information of plant poly(A) signals from a much large dataset (from 8K, but also applicable to a dataset of 16,000 sequences, <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>). Moreover, the generalized HMM was used in our algorithm. GHMM is known for better detaching the main model from sub-modeling of each signal state, a function that is expandable for modeling complicated signals. Detailed comparisons of the differences between HMM and GHMM that was used in our algorithm can be found in additional files [see Additional file <supplr sid="S1">1</supplr>].</p>
         <suppl id="S1">
            <title>
               <p>Additional File 1</p>
            </title>
            <text>
               <p>Forward-backward Algorithm of GHMM Used in <it>PASS</it>. Details on how GHMM was implemented in the algorithm, including parameter settings and mathematical formulas.</p>
            </text>
            <file name="1471-2105-8-43-S1.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <p>Liu et al. <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> used a machine learning method to generate human poly(A) signals and then used a support vector machine to identify the real sites. After refinement of their method, the sensitivity of their program increased from 56.3% to 94.4%, while its specificity reached 92.2%. Our results reached a similar level (although not by direct comparison), even though plant poly(A) signals are less conserved than those of humans. In particular, there are only 10 patterns that cover about 90% of NUE equivalent signals (53% being AAUAAA) in animals <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. By contrast, in Arabidopsis, a list of such patterns reaches several hundreds, with no predominant patterns (<abbrgrp><abbr bid="B6">6</abbr></abbrgrp>; Y-J. Shen and Q.Q. Li, unpublished observation). The prevalent signal AAUAAA, although it is the best, can only be found in about 10% of the plant genes <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. The rest of hundreds of signal patterns form a continuous distribution without a clear cut-off value (Y-J. Shen and Q.Q. Li, unpublished observation). Even so, NUE is still the strongest signal among the tri-part poly(A) signals, including FUE and CE, based on classical genetics analysis <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>.</p>
         <p>Most recently, Cheng et al. <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> also reported a human poly(A) site prediction algorithm using a support vector machine. The algorithm took advantage of 15 highly conserved poly(A) signals, but also used other signals and U-rich elements to contribute to the prediction efficiency. These additional features improved the program's sensitivity, although the specificity remained more or less the same. Integrating new features like secondary structure into PASS should also improve its performance.</p>
         <p>Beyond the known variability of the NUE signals in plants, a lack of conservation and identifiable features of other signal regions presents another difficulty in the prediction of poly(A) site by an algorithm. No highly conserved pattern was found in the FUE region. However, deletions of the FUE region were found to affect adjacent poly(A) site efficiency <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. The best feature in the FUE that helped our program was the distinct T and A richness of the region <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. The CE region also suffered a lack of sequence conservation. However, this region exhibited complex nucleotide profiles (See Additional file <supplr sid="S2">2</supplr>) that made feature selection easier. Under such circumstances, our program can predict many of the alterations of the poly(A) site efficiency in mutants constructed by conventional genetic means (Figure <figr fid="F4">4</figr>). In particular, upon the change of a few nucleotides within polyadenylation signals, <it>PASS </it>predicted the change of the poly(A) site usage efficiencies (Figure <figr fid="F4">4C</figr> and <figr fid="F4">4D</figr>) implying that the program has high merit in terms of accuracy.</p>
         <suppl id="S2">
            <title>
               <p>Additional File 2</p>
            </title>
            <text>
               <p>The distribution of nucleotides in the 20 nt region around poly(A) sites. Fraction of each of the four nucleotides around poly(A) sites.</p>
            </text>
            <file name="1471-2105-8-43-S2.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <p><it>PASS </it>should also be useful in gene annotation, where DNA sequences can be entered and the poly(A) site profiles deduced. The high values of the <it>PASS </it>predictions are indicative of potential poly(A) sites which signify the end of a mature transcript. We demonstrated this possibility by scanning fragments of genomic DNA in 50,000 nts in length (though it can process longer sequences essentially without an upper limit) as shown in Figure <figr fid="F5">5</figr>. Furthermore, <it>PASS </it>can also be used to predict alternative poly(A) sites that are not normally found by EST experiments. Alternative polyadenylation has been found to be more frequent than what was originally anticipated in human (50%) and plant (25%) genes <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B25">25</abbr></abbrgrp>. A complete understanding of the significance of alternative polyadenylation is yet to be realized. Our program should also be a useful addition towards achieving this goal.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>Based on the profiles of Arabidopsis polyadenylation signals, a new algorithm, named <it>PASS</it>, was developed to predict the poly(A) sites in plants. The efficacy of the program was tested using known poly(A) sites collected from EST sequencing projects or published papers. Interestingly, <it>PASS </it>can also predict the alterations of poly(A) site efficiency by traditional genetic mutations of poly(A) signals. Both specificity and sensitivity of the program reached around 97% at the best datasets. This algorithm will be useful in genome annotation by predicting the ends of the transcripts, in the study of alternative polyadenylation of mRNA, and in genetic engineering by enabling researchers to recognize and then eliminate potential undesirable poly(A) sites in the transgenes. The <it>PASS </it>program is available through our web site <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>The datasets</p>
            </st>
            <p>The experimental dataset (also called the 8K dataset) used here has been described previously <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, and contains 8160 sequences from the genome of <it>Arabidopsis thaliana</it>. Briefly, all available expressed sequence tags (ESTs) were downloaded from GenBank, and those containing terminal poly(A) sequences [8 to 15 nucleotide (nt) with at least 80% adenine content] were recognized and trimmed. The terminal nt of each trimmed polyadenylated transcript was classified as the last nt before a poly(A) site. A total of 8160 such poly(A) sites were identified and confirmed through the comparison of genomic and EST sequences (The oligo(A) should not be found in the genomic sequence because these were added post-transcriptionally during the polyadenylation process). Using the poly(A) sites as a reference, the corresponding 400 nt genomic sequences were extracted in such a way that each sequence contained 301 nt upstream and 99 nt downstream of the poly(A) site. Thus, the poly(A) site in each sequence was between the 301<sup>st </sup>nt and the 302<sup>nd </sup>nt (from left to right; the poly(A) site was also the cleavage site. The cleavage reaction occurs between two nucleotides linked by a phosphodiester bond). The cleavage site is defined as the "0" position (note there is no nucleotide assigned to this position). Hence, the nucleotide sequences on the left (upstream) have a negative designation, and on the right have a positive (often omitted) designation. In general, for the purpose of easier description, the nucleotide on the left of the cleavage site (position 301 in the dataset) is normally referred to as the a poly(A) site.</p>
            <p>This dataset was used to extract the features of the poly(A) signals and poly(A) sites. Other test sequences shown in the results were either downloaded from GenBank or from published papers as cited. For the <it>Sp </it>calculation, control datasets of the Arabidopsis coding (which do not include 5' and 3' UTRs and introns), 5'-UTR, and intron sequences were downloaded from The Arabidopsis Information Resources website (TAIR; <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>; Arabidopsis Genome Initiative Release 5, 2004). These sequences were trimmed into 400 nt in length each for better comparison with the 8K dataset. The coding sequence datasets were extracted from downloaded sequences in the range of 300&#8211;700 to avoid the inclusion of UTRs. The random sequences for the <it>Sp </it>calculation were generated based on the second order trinucleotide distribution <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> in the 8K dataset. For the genomic sequence scan, the Arabidopsis chromosome 4 genomic sequence was used (ATH1_chr4.1con.01222004; from TAIR). It is worth mentioning that the sequences used in this work are in DNA form, so nucleotides in these sequences are ATCG instead of AUCG as in RNA. This does not impact the analysis.</p>
         </sec>
         <sec>
            <st>
               <p>Modeling routine</p>
            </st>
            <p>The topological structure is one of the most important factors in designing a GHMM model. The regular expression of topological structures in GHMM models is based on all connection structures, in which every state can go to any other state. This kind of topological structure does not take advantage of the positions of the signal elements in the 3' UTR (Figure <figr fid="F6">6A</figr>). Therefore, we employed a GHMM model that recognized the signals from left to right, and only allowed the recognition of signals from the current state to the next state in one direction, as indicated in Figure <figr fid="F6">6B</figr>. Based on the analysis of the current model of plant poly(A) signals <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, we classified the sequences into five regions (Figure <figr fid="F6">6A</figr>). The poly(A) signals are distributed in these regions with some spacing between the two signal elements. Based on this, we added a background state between the two signal states. To simplify the model, we assumed that the length of every signal was fixed but the length of background was variable. It was also possible that two signals were next to each other and thus the length of the background may be 0. The final model was designed in such a way that all calculations began on the first state and ended at the last state (Figure <figr fid="F6">6B</figr>). The order of the algorithm is shown in Figure <figr fid="F6">6C</figr>.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>The structure of plant mRNA polyadenylation signals, the order of the GHMM, and a flowchart of <it>PASS</it></p>
               </caption>
               <text>
                  <p>The structure of plant mRNA polyadenylation signals, the order of the GHMM, and a flowchart of <it>PASS</it>. A. A working model based on [6]. B. The order of GHMM. The arrowheads indicate the probability of changing of states (all probabilities were set to be 1). The rectangles represent regions with fixed length while the braces indicate regions with variable length. 3'-UTR, 3' untranslated region; CD, coding region; FUE, Far Upstream Element; NUE, Near Upstream Element; CE, cleavage element; CE-L, CE-R, Cleavage element left or right to the poly(A) site; CS, cleavage site, also known as poly(A) site; YA, represents TA or CA &#8211; predominant dinucleotides right before CS; B, beginning of the scan; Bg, background sequences between <it>cis</it>-elements; E, end of scan. Note that because YA is not found in all sequences, other dinucleotide combinations are also considered in GHMM. C. Flow chart of the <it>PASS </it>algorithm.</p>
               </text>
               <graphic file="1471-2105-8-43-6"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Parameter setting</p>
            </st>
            <p>In this model, some basic parameters were set as follows: the number of states was 11 (Figure <figr fid="F6">6B</figr>, from Bg1 through Bg6); the array of signals in every state is set to be {A, T, C, G}. States in odd numbers were the background state with variable length, and states in even numbers were signal states with fixed length. Because the model begins with the first state and ends at the last state, the initial state distribution is set in &#960; = {1,0,...,0}. Because every state (i) can be only transferred to the i+1 state, only the value of a<sub>i,i+1 </sub>is 1 in the state transition probability matrix, and other elements of the distribution matrix are 0. Other parameters such as distribution of nucleotides and length of state will be described below.</p>
         </sec>
         <sec>
            <st>
               <p>Length of the signal elements</p>
            </st>
            <p>For modeling purposes, we needed to assign each state a few parameters including the signal nucleotide composition, signal pattern length, etc. To simplify the model, we assumed that the size or nucleotide length of each signal (FUE, NUE, CE, respectively) was fixed. As a first step, we had to designate reasonable signal lengths for each of them. To this end, we designed the following method to extract this data from the SignalSleuth experiments as described in Loke et al <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. The observed total count of 3 nt patterns was used as a basis to calculate the "expected count" of pattern sizes of 4 nt, and the observed total count of 4 nt patterns was used to calculate the "expected count" of 5 nt, and so on. For example, the expected count of 4 nt patterns should decrease by 25% of the actual total count of 3 nt patterns because of an increase in length by one of the four nucleotides. The difference between the predicted count of patterns (random chance) and the actual observed count is useful in measuring pattern length uniqueness. The measure of the deviation from the randomness of the patterns offers a clue as to the potential length of the signals, because the real signals should have greater deviation from randomness than that of non-signals. As indicated on the histogram (see Additional file <supplr sid="S3">3</supplr>), the greatest difference observed for FUE is at 8 nt, NUE at 6 nt, CE-L at 6 nt, and CE-R at 7 nt, respectively. Importantly, using this bioinformatics approach, the signal lengths of the NUE and FUE match the lengths of NUE and FUE defined by classical genetic analysis, in which it was found that the NUE signal length is 6 nt, and the FUE is about 8 nt <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B27">27</abbr></abbrgrp>. Note that the signal length is different from the range of signal region where the signal can be found. The latter is for modeling purposes, and is larger because it describes a collective area where the signals are found in different genes.</p>
            <suppl id="S3">
               <title>
                  <p>Additional File 3</p>
               </title>
               <text>
                  <p>Analysis of the lengths of signal elements in FUE, NUE, CE-L and CE-R. Data showing the reason why the length of nucleotide sequences of each signal element was chosen.</p>
               </text>
               <file name="1471-2105-8-43-S3.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Output probability of signal state</p>
            </st>
            <p>After determining the length of the signals, we needed to study the output probability <it>B </it>of the nucleotides (A, T, C, and G) in every signal state. To this end, we analyzed the distribution of nucleotides in each region (FUE, NUE and CEs) with the formula below, using the frequency data that was generated by SignalSleuth as described by Loke et al <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> using the 8K dataset.</p>
            <p>
               <m:math name="1471-2105-8-43-i2" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:msub>
                           <m:mi>D</m:mi>
                           <m:mi>&#949;</m:mi>
                        </m:msub>
                        <m:mo>=</m:mo>
                        <m:mfrac>
                           <m:mrow>
                              <m:mstyle displaystyle="true">
                                 <m:munderover>
                                    <m:mo>&#8721;</m:mo>
                                    <m:mrow>
                                       <m:mi>i</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                    <m:mi>N</m:mi>
                                 </m:munderover>
                                 <m:mrow>
                                    <m:mrow>
                                       <m:mo>(</m:mo>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>&#949;</m:mi>
                                             <m:mi>i</m:mi>
                                          </m:msub>
                                          <m:mo>&#215;</m:mo>
                                          <m:msub>
                                             <m:mi>W</m:mi>
                                             <m:mi>i</m:mi>
                                          </m:msub>
                                       </m:mrow>
                                       <m:mo>)</m:mo>
                                    </m:mrow>
                                 </m:mrow>
                              </m:mstyle>
                           </m:mrow>
                           <m:mrow>
                              <m:mstyle displaystyle="true">
                                 <m:munder>
                                    <m:mo>&#8721;</m:mo>
                                    <m:mrow>
                                       <m:mi>&#949;</m:mi>
                                       <m:mo>&#8712;</m:mo>
                                       <m:mo>{</m:mo>
                                       <m:mi>A</m:mi>
                                       <m:mo>,</m:mo>
                                       <m:mi>T</m:mi>
                                       <m:mo>,</m:mo>
                                       <m:mi>C</m:mi>
                                       <m:mo>,</m:mo>
                                       <m:mi>G</m:mi>
                                       <m:mo>}</m:mo>
                                    </m:mrow>
                                 </m:munder>
                                 <m:mrow>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mstyle displaystyle="true">
                                       <m:munderover>
                                          <m:mo>&#8721;</m:mo>
                                          <m:mrow>
                                             <m:mi>i</m:mi>
                                             <m:mo>=</m:mo>
                                             <m:mn>1</m:mn>
                                          </m:mrow>
                                          <m:mi>N</m:mi>
                                       </m:munderover>
                                       <m:mrow>
                                          <m:mrow>
                                             <m:mo>(</m:mo>
                                             <m:mrow>
                                                <m:msub>
                                                   <m:mi>&#949;</m:mi>
                                                   <m:mi>i</m:mi>
                                                </m:msub>
                                                <m:mo>&#215;</m:mo>
                                                <m:msub>
                                                   <m:mi>W</m:mi>
                                                   <m:mi>i</m:mi>
                                                </m:msub>
                                             </m:mrow>
                                             <m:mo>)</m:mo>
                                          </m:mrow>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                    </m:mstyle>
                                 </m:mrow>
                              </m:mstyle>
                           </m:mrow>
                        </m:mfrac>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGebardaWgaaWcbaacciGae8xTdugabeaakiabg2da9maalaaabaWaaabCaeaadaqadaqaaiab=v7aLnaaBaaaleaacqWGPbqAaeqaaOGaey41aqRaem4vaC1aaSbaaSqaaiabdMgaPbqabaaakiaawIcacaGLPaaaaSqaaiabdMgaPjabg2da9iabigdaXaqaaiabd6eaobqdcqGHris5aaGcbaWaaabuaeaacqGGOaakdaaeWbqaamaabmaabaGae8xTdu2aaSbaaSqaaiabdMgaPbqabaGccqGHxdaTcqWGxbWvdaWgaaWcbaGaemyAaKgabeaaaOGaayjkaiaawMcaaiabcMcaPaWcbaGaemyAaKMaeyypa0JaeGymaedabaGaemOta4eaniabggHiLdaaleaacqWF1oqzcqGHiiIZcqGG7bWEcqWGbbqqcqGGSaalcqWGubavcqGGSaalcqWGdbWqcqGGSaalcqWGhbWrcqGG9bqFaeqaniabggHiLdaaaaaa@6264@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>where <m:math name="1471-2105-8-43-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>W</m:mi><m:mi>i</m:mi></m:msub><m:mo>=</m:mo><m:mfrac><m:mrow><m:msub><m:mi>C</m:mi><m:mtext>i</m:mtext></m:msub></m:mrow><m:mrow><m:mstyle displaystyle="true"><m:munderover><m:mo>&#8721;</m:mo><m:mrow><m:mi>s</m:mi><m:mo>=</m:mo><m:mtext>1</m:mtext></m:mrow><m:mtext>N</m:mtext></m:munderover><m:mrow><m:msub><m:mi>C</m:mi><m:mtext>s</m:mtext></m:msub></m:mrow></m:mstyle></m:mrow></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGxbWvdaWgaaWcbaGaemyAaKgabeaakiabg2da9maalaaabaGaem4qam0aaSbaaSqaaiabbMgaPbqabaaakeaadaaeWbqaaiabdoeadnaaBaaaleaacqqGZbWCaeqaaaqaaiabdohaZjabg2da9iabbgdaXaqaaiabb6eaobqdcqGHris5aaaaaaa@3C88@</m:annotation></m:semantics></m:math> is the statistical weight of sequence <it>i</it>, and the more repeats this sequence has, the higher the weight is; <it>C</it><sub><it>i </it></sub>is the frequency at which the <it>i</it><sup>th </sup>sequence occurs in this signal area; <it>D</it><sub><it>&#949; </it></sub>represents the probability of the nucleotide <it>&#949; </it>in the signal element, <it>i.e</it>. the distribution of nucleotides, which is <it>&#949; </it>&#8712; {A, T, C, G}; <it>&#949;</it><sub><it>i </it></sub>is the frequency of <it>&#949; </it>in the <it>i</it><sup>th </sup>sequence, where 1 &#8804; <it>i </it>&#8804; <it>N</it>, and <it>N </it>is the number of signal patterns considered.</p>
            <p>Taking signals in the FUE as an example, the representative nucleotide output probability <it>B </it>was calculated based on the top 50 patterns [see Additional file <supplr sid="S4">4</supplr>]. First, we calculated the weight of every pattern by count, and then calculated the repeat times <it>&#949;</it><sub><it>i </it></sub>of every nucleotide in these patterns. The nucleotide output probability <it>B </it>for FUE hence are 0.0485, 0.7740, 0.0479 and 0.1290 for A, T, C, and G, respectively.</p>
            <suppl id="S4">
               <title>
                  <p>Additional File 4</p>
               </title>
               <text>
                  <p>Frequency of eight nucleotides patterns with high counts in FUE. Ranked list of the counts of the top 50 patterns found in FUE.</p>
               </text>
               <file name="1471-2105-8-43-S4.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>Using the same method, the nucleotide output probability <it>B </it>for CE-L and CE-R are: CE-L: 0.09987, 0.74970, 0.06186, 0.08860; CE-R: 0.08520, 0.78700, 0.07050, 0.05680 for A, T, C, and G, respectively.</p>
            <p>The NUE signals are slightly better conserved than other signals, and the transition from one nt to the next may be constrained. To present these interactions of hexamer signals, we used a subset of first order inhomogeneous Markov model to describe the feature information. A frequency transport matrix was used to analyze the 50 most predominant NUE signals <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. The equation is shown below:</p>
            <p>
               <m:math name="1471-2105-8-43-i4" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mi>P</m:mi>
                        <m:mi>N</m:mi>
                        <m:mo>=</m:mo>
                        <m:mrow>
                           <m:mo>[</m:mo>
                           <m:mrow>
                              <m:mtable>
                                 <m:mtr>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mi>P</m:mi>
                                          <m:mi>N</m:mi>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>A</m:mi>
                                          <m:mo>/</m:mo>
                                          <m:mi>A</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mi>P</m:mi>
                                          <m:mi>N</m:mi>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>T</m:mi>
                                          <m:mo>/</m:mo>
                                          <m:mi>A</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mi>P</m:mi>
                                          <m:mi>N</m:mi>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>C</m:mi>
                                          <m:mo>/</m:mo>
                                          <m:mi>A</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mi>P</m:mi>
                                          <m:mi>N</m:mi>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>G</m:mi>
                                          <m:mo>/</m:mo>
                                          <m:mi>A</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                    </m:mtd>
                                 </m:mtr>
                                 <m:mtr>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mi>P</m:mi>
                                          <m:mi>N</m:mi>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>A</m:mi>
                                          <m:mo>/</m:mo>
                                          <m:mi>T</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mi>P</m:mi>
                                          <m:mi>N</m:mi>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>T</m:mi>
                                          <m:mo>/</m:mo>
                                          <m:mi>T</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mi>P</m:mi>
                                          <m:mi>N</m:mi>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>C</m:mi>
                                          <m:mo>/</m:mo>
                                          <m:mi>T</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mi>P</m:mi>
                                          <m:mi>N</m:mi>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>G</m:mi>
                                          <m:mo>/</m:mo>
                                          <m:mi>T</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                    </m:mtd>
                                 </m:mtr>
                                 <m:mtr>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mi>P</m:mi>
                                          <m:mi>N</m:mi>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>A</m:mi>
                                          <m:mo>/</m:mo>
                                          <m:mi>C</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mi>P</m:mi>
                                          <m:mi>N</m:mi>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>T</m:mi>
                                          <m:mo>/</m:mo>
                                          <m:mi>C</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mi>P</m:mi>
                                          <m:mi>N</m:mi>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>C</m:mi>
                                          <m:mo>/</m:mo>
                                          <m:mi>C</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mi>P</m:mi>
                                          <m:mi>N</m:mi>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>G</m:mi>
                                          <m:mo>/</m:mo>
                                          <m:mi>C</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                    </m:mtd>
                                 </m:mtr>
                                 <m:mtr>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mi>P</m:mi>
                                          <m:mi>N</m:mi>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>A</m:mi>
                                          <m:mo>/</m:mo>
                                          <m:mi>G</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mi>P</m:mi>
                                          <m:mi>N</m:mi>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>T</m:mi>
                                          <m:mo>/</m:mo>
                                          <m:mi>G</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mi>P</m:mi>
                                          <m:mi>N</m:mi>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>C</m:mi>
                                          <m:mo>/</m:mo>
                                          <m:mi>G</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mi>P</m:mi>
                                          <m:mi>N</m:mi>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>G</m:mi>
                                          <m:mo>/</m:mo>
                                          <m:mi>G</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                    </m:mtd>
                                 </m:mtr>
                              </m:mtable>
                           </m:mrow>
                           <m:mo>]</m:mo>
                        </m:mrow>
                        <m:mo>=</m:mo>
                        <m:mrow>
                           <m:mo>[</m:mo>
                           <m:mrow>
                              <m:mtable>
                                 <m:mtr>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mfrac>
                                             <m:mrow>
                                                <m:mi>S</m:mi>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>A</m:mi>
                                                <m:mi>A</m:mi>
                                                <m:mo stretchy="false">)</m:mo>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mstyle displaystyle="true">
                                                   <m:msub>
                                                      <m:mo>&#8721;</m:mo>
                                                      <m:mi>A</m:mi>
                                                   </m:msub>
                                                   <m:mrow/>
                                                </m:mstyle>
                                             </m:mrow>
                                          </m:mfrac>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mfrac>
                                             <m:mrow>
                                                <m:mi>S</m:mi>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>A</m:mi>
                                                <m:mi>T</m:mi>
                                                <m:mo stretchy="false">)</m:mo>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mstyle displaystyle="true">
                                                   <m:msub>
                                                      <m:mo>&#8721;</m:mo>
                                                      <m:mi>A</m:mi>
                                                   </m:msub>
                                                   <m:mrow/>
                                                </m:mstyle>
                                             </m:mrow>
                                          </m:mfrac>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mfrac>
                                             <m:mrow>
                                                <m:mi>S</m:mi>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>A</m:mi>
                                                <m:mi>C</m:mi>
                                                <m:mo stretchy="false">)</m:mo>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mstyle displaystyle="true">
                                                   <m:msub>
                                                      <m:mo>&#8721;</m:mo>
                                                      <m:mi>A</m:mi>
                                                   </m:msub>
                                                   <m:mrow/>
                                                </m:mstyle>
                                             </m:mrow>
                                          </m:mfrac>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mfrac>
                                             <m:mrow>
                                                <m:mi>S</m:mi>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>A</m:mi>
                                                <m:mi>G</m:mi>
                                                <m:mo stretchy="false">)</m:mo>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mstyle displaystyle="true">
                                                   <m:msub>
                                                      <m:mo>&#8721;</m:mo>
                                                      <m:mi>A</m:mi>
                                                   </m:msub>
                                                   <m:mrow/>
                                                </m:mstyle>
                                             </m:mrow>
                                          </m:mfrac>
                                       </m:mrow>
                                    </m:mtd>
                                 </m:mtr>
                                 <m:mtr>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mfrac>
                                             <m:mrow>
                                                <m:mi>S</m:mi>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>T</m:mi>
                                                <m:mi>A</m:mi>
                                                <m:mo stretchy="false">)</m:mo>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mstyle displaystyle="true">
                                                   <m:msub>
                                                      <m:mo>&#8721;</m:mo>
                                                      <m:mi>T</m:mi>
                                                   </m:msub>
                                                   <m:mrow/>
                                                </m:mstyle>
                                             </m:mrow>
                                          </m:mfrac>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mfrac>
                                             <m:mrow>
                                                <m:mi>S</m:mi>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>T</m:mi>
                                                <m:mi>T</m:mi>
                                                <m:mo stretchy="false">)</m:mo>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mstyle displaystyle="true">
                                                   <m:msub>
                                                      <m:mo>&#8721;</m:mo>
                                                      <m:mi>T</m:mi>
                                                   </m:msub>
                                                   <m:mrow/>
                                                </m:mstyle>
                                             </m:mrow>
                                          </m:mfrac>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mfrac>
                                             <m:mrow>
                                                <m:mi>S</m:mi>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>T</m:mi>
                                                <m:mi>C</m:mi>
                                                <m:mo stretchy="false">)</m:mo>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mstyle displaystyle="true">
                                                   <m:msub>
                                                      <m:mo>&#8721;</m:mo>
                                                      <m:mi>T</m:mi>
                                                   </m:msub>
                                                   <m:mrow/>
                                                </m:mstyle>
                                             </m:mrow>
                                          </m:mfrac>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mfrac>
                                             <m:mrow>
                                                <m:mi>S</m:mi>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>T</m:mi>
                                                <m:mi>G</m:mi>
                                                <m:mo stretchy="false">)</m:mo>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mstyle displaystyle="true">
                                                   <m:msub>
                                                      <m:mo>&#8721;</m:mo>
                                                      <m:mi>T</m:mi>
                                                   </m:msub>
                                                   <m:mrow/>
                                                </m:mstyle>
                                             </m:mrow>
                                          </m:mfrac>
                                       </m:mrow>
                                    </m:mtd>
                                 </m:mtr>
                                 <m:mtr>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mfrac>
                                             <m:mrow>
                                                <m:mi>S</m:mi>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>C</m:mi>
                                                <m:mi>A</m:mi>
                                                <m:mo stretchy="false">)</m:mo>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mstyle displaystyle="true">
                                                   <m:msub>
                                                      <m:mo>&#8721;</m:mo>
                                                      <m:mi>C</m:mi>
                                                   </m:msub>
                                                   <m:mrow/>
                                                </m:mstyle>
                                             </m:mrow>
                                          </m:mfrac>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mfrac>
                                             <m:mrow>
                                                <m:mi>S</m:mi>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>C</m:mi>
                                                <m:mi>T</m:mi>
                                                <m:mo stretchy="false">)</m:mo>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mstyle displaystyle="true">
                                                   <m:msub>
                                                      <m:mo>&#8721;</m:mo>
                                                      <m:mi>C</m:mi>
                                                   </m:msub>
                                                   <m:mrow/>
                                                </m:mstyle>
                                             </m:mrow>
                                          </m:mfrac>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mfrac>
                                             <m:mrow>
                                                <m:mi>S</m:mi>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>C</m:mi>
                                                <m:mi>C</m:mi>
                                                <m:mo stretchy="false">)</m:mo>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mstyle displaystyle="true">
                                                   <m:msub>
                                                      <m:mo>&#8721;</m:mo>
                                                      <m:mi>C</m:mi>
                                                   </m:msub>
                                                   <m:mrow/>
                                                </m:mstyle>
                                             </m:mrow>
                                          </m:mfrac>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mfrac>
                                             <m:mrow>
                                                <m:mi>S</m:mi>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>C</m:mi>
                                                <m:mi>G</m:mi>
                                                <m:mo stretchy="false">)</m:mo>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mstyle displaystyle="true">
                                                   <m:msub>
                                                      <m:mo>&#8721;</m:mo>
                                                      <m:mi>C</m:mi>
                                                   </m:msub>
                                                   <m:mrow/>
                                                </m:mstyle>
                                             </m:mrow>
                                          </m:mfrac>
                                       </m:mrow>
                                    </m:mtd>
                                 </m:mtr>
                                 <m:mtr>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mfrac>
                                             <m:mrow>
                                                <m:mi>S</m:mi>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>G</m:mi>
                                                <m:mi>A</m:mi>
                                                <m:mo stretchy="false">)</m:mo>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mstyle displaystyle="true">
                                                   <m:msub>
                                                      <m:mo>&#8721;</m:mo>
                                                      <m:mi>G</m:mi>
                                                   </m:msub>
                                                   <m:mrow/>
                                                </m:mstyle>
                                             </m:mrow>
                                          </m:mfrac>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mfrac>
                                             <m:mrow>
                                                <m:mi>S</m:mi>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>G</m:mi>
                                                <m:mi>T</m:mi>
                                                <m:mo stretchy="false">)</m:mo>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mstyle displaystyle="true">
                                                   <m:msub>
                                                      <m:mo>&#8721;</m:mo>
                                                      <m:mi>G</m:mi>
                                                   </m:msub>
                                                   <m:mrow/>
                                                </m:mstyle>
                                             </m:mrow>
                                          </m:mfrac>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mfrac>
                                             <m:mrow>
                                                <m:mi>S</m:mi>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>G</m:mi>
                                                <m:mi>C</m:mi>
                                                <m:mo stretchy="false">)</m:mo>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mstyle displaystyle="true">
                                                   <m:msub>
                                                      <m:mo>&#8721;</m:mo>
                                                      <m:mi>G</m:mi>
                                                   </m:msub>
                                                   <m:mrow/>
                                                </m:mstyle>
                                             </m:mrow>
                                          </m:mfrac>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:mfrac>
                                             <m:mrow>
                                                <m:mi>S</m:mi>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>G</m:mi>
                                                <m:mi>G</m:mi>
                                                <m:mo stretchy="false">)</m:mo>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mstyle displaystyle="true">
                                                   <m:msub>
                                                      <m:mo>&#8721;</m:mo>
                                                      <m:mi>G</m:mi>
                                                   </m:msub>
                                                   <m:mrow/>
                                                </m:mstyle>
                                             </m:mrow>
                                          </m:mfrac>
                                       </m:mrow>
                                    </m:mtd>
                                 </m:mtr>
                              </m:mtable>
                           </m:mrow>
                           <m:mo>]</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGqbaucqWGobGtcqGH9aqpdaWadaqaauaabeqaeqaaaaaabaGaemiuaaLaemOta4KaeiikaGIaemyqaeKaei4la8IaemyqaeKaeiykaKcabaGaemiuaaLaemOta4KaeiikaGIaemivaqLaei4la8IaemyqaeKaeiykaKcabaGaemiuaaLaemOta4KaeiikaGIaem4qamKaei4la8IaemyqaeKaeiykaKcabaGaemiuaaLaemOta4KaeiikaGIaem4raCKaei4la8IaemyqaeKaeiykaKcabaGaemiuaaLaemOta4KaeiikaGIaemyqaeKaei4la8IaemivaqLaeiykaKcabaGaemiuaaLaemOta4KaeiikaGIaemivaqLaei4la8IaemivaqLaeiykaKcabaGaemiuaaLaemOta4KaeiikaGIaem4qamKaei4la8IaemivaqLaeiykaKcabaGaemiuaaLaemOta4KaeiikaGIaem4raCKaei4la8IaemivaqLaeiykaKcabaGaemiuaaLaemOta4KaeiikaGIaemyqaeKaei4la8Iaem4qamKaeiykaKcabaGaemiuaaLaemOta4KaeiikaGIaemivaqLaei4la8Iaem4qamKaeiykaKcabaGaemiuaaLaemOta4KaeiikaGIaem4qamKaei4la8Iaem4qamKaeiykaKcabaGaemiuaaLaemOta4KaeiikaGIaem4raCKaei4la8Iaem4qamKaeiykaKcabaGaemiuaaLaemOta4KaeiikaGIaemyqaeKaei4la8Iaem4raCKaeiykaKcabaGaemiuaaLaemOta4KaeiikaGIaemivaqLaei4la8Iaem4raCKaeiykaKcabaGaemiuaaLaemOta4KaeiikaGIaem4qamKaei4la8Iaem4raCKaeiykaKcabaGaemiuaaLaemOta4KaeiikaGIaem4raCKaei4la8Iaem4raCKaeiykaKcaaaGaay5waiaaw2faaiabg2da9maadmaabaqbaeqabqabaaaaaeaadaWcaaqaaiabdofatjabcIcaOiabdgeabjabdgeabjabcMcaPaqaamaaqababaaaleaacqWGbbqqaeqaniabggHiLdaaaaGcbaWaaSaaaeaacqWGtbWucqGGOaakcqWGbbqqcqWGubavcqGGPaqkaeaadaaeqaqaaaWcbaGaemyqaeeabeqdcqGHris5aaaaaOqaamaalaaabaGaem4uamLaeiikaGIaemyqaeKaem4qamKaeiykaKcabaWaaabeaeaaaSqaaiabdgeabbqab0GaeyyeIuoaaaaakeaadaWcaaqaaiabdofatjabcIcaOiabdgeabjabdEeahjabcMcaPaqaamaaqababaaaleaacqWGbbqqaeqaniabggHiLdaaaaGcbaWaaSaaaeaacqWGtbWucqGGOaakcqWGubavcqWGbbqqcqGGPaqkaeaadaaeqaqaaaWcbaGaemivaqfabeqdcqGHris5aaaaaOqaamaalaaabaGaem4uamLaeiikaGIaemivaqLaemivaqLaeiykaKcabaWaaabeaeaaaSqaaiabdsfaubqab0GaeyyeIuoaaaaakeaadaWcaaqaaiabdofatjabcIcaOiabdsfaujabdoeadjabcMcaPaqaamaaqababaaaleaacqWGubavaeqaniabggHiLdaaaaGcbaWaaSaaaeaacqWGtbWucqGGOaakcqWGubavcqWGhbWrcqGGPaqkaeaadaaeqaqaaaWcbaGaemivaqfabeqdcqGHris5aaaaaOqaamaalaaabaGaem4uamLaeiikaGIaem4qamKaemyqaeKaeiykaKcabaWaaabeaeaaaSqaaiabdoeadbqab0GaeyyeIuoaaaaakeaadaWcaaqaaiabdofatjabcIcaOiabdoeadjabdsfaujabcMcaPaqaamaaqababaaaleaacqWGdbWqaeqaniabggHiLdaaaaGcbaWaaSaaaeaacqWGtbWucqGGOaakcqWGdbWqcqWGdbWqcqGGPaqkaeaadaaeqaqaaaWcbaGaem4qameabeqdcqGHris5aaaaaOqaamaalaaabaGaem4uamLaeiikaGIaem4qamKaem4raCKaeiykaKcabaWaaabeaeaaaSqaaiabdoeadbqab0GaeyyeIuoaaaaakeaadaWcaaqaaiabdofatjabcIcaOiabdEeahjabdgeabjabcMcaPaqaamaaqababaaaleaacqWGhbWraeqaniabggHiLdaaaaGcbaWaaSaaaeaacqWGtbWucqGGOaakcqWGhbWrcqWGubavcqGGPaqkaeaadaaeqaqaaaWcbaGaem4raCeabeqdcqGHris5aaaaaOqaamaalaaabaGaem4uamLaeiikaGIaem4raCKaem4qamKaeiykaKcabaWaaabeaeaaaSqaaiabdEeahbqab0GaeyyeIuoaaaaakeaadaWcaaqaaiabdofatjabcIcaOiabdEeahjabdEeahjabcMcaPaqaamaaqababaaaleaacqWGhbWraeqaniabggHiLdaaaaaaaOGaay5waiaaw2faaaaa@2912@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>where <it>PN</it>(<it>T</it>/<it>A</it>) is the probability of a transition from state "<it>A</it>" to "<it>T</it>"; <it>S</it>(<it>AT</it>) is the sum of times of this transport; &#931;<sub><it>A </it></sub>= <it>S</it>(<it>AA</it>) + <it>S</it>(<it>AT</it>) + <it>S</it>(<it>AC</it>) + <it>S</it>(<it>AG</it>). The same rule was used for the others. Thus, we obtained parameters of the NUE sub-model as: probability distribution of the first nucleotide, <it>PN</it><sub>0 </sub>= [0.6276, 0.3563, 0.0001, 0.0161]. The distributions of the second to sixth nucleotides are listed below, respectively,</p>
            <p>
               <m:math name="1471-2105-8-43-i5" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mtable>
                           <m:mtr>
                              <m:mtd>
                                 <m:mrow>
                                    <m:mi>P</m:mi>
                                    <m:msub>
                                       <m:mi>N</m:mi>
                                       <m:mn>1</m:mn>
                                    </m:msub>
                                    <m:mo>=</m:mo>
                                    <m:mrow>
                                       <m:mo>[</m:mo>
                                       <m:mrow>
                                          <m:mtable columnalign="left">
                                             <m:mtr columnalign="left">
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.6215</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.3785</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                   </m:mrow>
                                                </m:mtd>
                                             </m:mtr>
                                             <m:mtr columnalign="left">
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.6788</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.3212</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                   </m:mrow>
                                                </m:mtd>
                                             </m:mtr>
                                             <m:mtr columnalign="left">
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>1</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                   </m:mrow>
                                                </m:mtd>
                                             </m:mtr>
                                             <m:mtr columnalign="left">
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>1</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                   </m:mrow>
                                                </m:mtd>
                                             </m:mtr>
                                          </m:mtable>
                                       </m:mrow>
                                       <m:mo>]</m:mo>
                                    </m:mrow>
                                    <m:mo>,</m:mo>
                                 </m:mrow>
                              </m:mtd>
                              <m:mtd>
                                 <m:mrow/>
                              </m:mtd>
                           </m:mtr>
                           <m:mtr>
                              <m:mtd>
                                 <m:mrow>
                                    <m:mi>P</m:mi>
                                    <m:msub>
                                       <m:mi>N</m:mi>
                                       <m:mn>2</m:mn>
                                    </m:msub>
                                    <m:mo>=</m:mo>
                                    <m:mrow>
                                       <m:mo>[</m:mo>
                                       <m:mrow>
                                          <m:mtable columnalign="left">
                                             <m:mtr columnalign="left">
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.5229</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.4464</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0307</m:mn>
                                                   </m:mrow>
                                                </m:mtd>
                                             </m:mtr>
                                             <m:mtr columnalign="left">
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.6003</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.2637</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0441</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0920</m:mn>
                                                   </m:mrow>
                                                </m:mtd>
                                             </m:mtr>
                                             <m:mtr columnalign="left">
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>1</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                   </m:mrow>
                                                </m:mtd>
                                             </m:mtr>
                                             <m:mtr columnalign="left">
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>1</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                   </m:mrow>
                                                </m:mtd>
                                             </m:mtr>
                                          </m:mtable>
                                       </m:mrow>
                                       <m:mo>]</m:mo>
                                    </m:mrow>
                                    <m:mo>,</m:mo>
                                 </m:mrow>
                              </m:mtd>
                              <m:mtd>
                                 <m:mrow>
                                    <m:mi>P</m:mi>
                                    <m:msub>
                                       <m:mi>N</m:mi>
                                       <m:mn>3</m:mn>
                                    </m:msub>
                                    <m:mo>=</m:mo>
                                    <m:mrow>
                                       <m:mo>[</m:mo>
                                       <m:mrow>
                                          <m:mtable columnalign="left">
                                             <m:mtr columnalign="left">
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.5470</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.4258</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0272</m:mn>
                                                   </m:mrow>
                                                </m:mtd>
                                             </m:mtr>
                                             <m:mtr columnalign="left">
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.6700</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.2377</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0418</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0505</m:mn>
                                                   </m:mrow>
                                                </m:mtd>
                                             </m:mtr>
                                             <m:mtr columnalign="left">
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>1</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                   </m:mrow>
                                                </m:mtd>
                                             </m:mtr>
                                             <m:mtr columnalign="left">
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>1</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                   </m:mrow>
                                                </m:mtd>
                                             </m:mtr>
                                          </m:mtable>
                                       </m:mrow>
                                       <m:mo>]</m:mo>
                                    </m:mrow>
                                 </m:mrow>
                              </m:mtd>
                           </m:mtr>
                           <m:mtr>
                              <m:mtd>
                                 <m:mrow>
                                    <m:mi>P</m:mi>
                                    <m:msub>
                                       <m:mi>N</m:mi>
                                       <m:mn>4</m:mn>
                                    </m:msub>
                                    <m:mo>=</m:mo>
                                    <m:mrow>
                                       <m:mo>[</m:mo>
                                       <m:mrow>
                                          <m:mtable columnalign="left">
                                             <m:mtr columnalign="left">
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.6181</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.3584</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0235</m:mn>
                                                   </m:mrow>
                                                </m:mtd>
                                             </m:mtr>
                                             <m:mtr columnalign="left">
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.7537</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.2463</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                   </m:mrow>
                                                </m:mtd>
                                             </m:mtr>
                                             <m:mtr columnalign="left">
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>1</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                   </m:mrow>
                                                </m:mtd>
                                             </m:mtr>
                                             <m:mtr columnalign="left">
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>1</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                   </m:mrow>
                                                </m:mtd>
                                             </m:mtr>
                                          </m:mtable>
                                       </m:mrow>
                                       <m:mo>]</m:mo>
                                    </m:mrow>
                                    <m:mo>,</m:mo>
                                 </m:mrow>
                              </m:mtd>
                              <m:mtd>
                                 <m:mrow>
                                    <m:mi>P</m:mi>
                                    <m:msub>
                                       <m:mi>N</m:mi>
                                       <m:mn>5</m:mn>
                                    </m:msub>
                                    <m:mo>=</m:mo>
                                    <m:mrow>
                                       <m:mo>[</m:mo>
                                       <m:mrow>
                                          <m:mtable columnalign="left">
                                             <m:mtr columnalign="left">
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.5550</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.4212</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0238</m:mn>
                                                   </m:mrow>
                                                </m:mtd>
                                             </m:mtr>
                                             <m:mtr columnalign="left">
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.5277</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.4723</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                   </m:mrow>
                                                </m:mtd>
                                             </m:mtr>
                                             <m:mtr columnalign="left">
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>1</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                   </m:mrow>
                                                </m:mtd>
                                             </m:mtr>
                                             <m:mtr columnalign="left">
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>1</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                      <m:mo>,</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd columnalign="left">
                                                   <m:mrow>
                                                      <m:mn>0.0001</m:mn>
                                                   </m:mrow>
                                                </m:mtd>
                                             </m:mtr>
                                          </m:mtable>
                                       </m:mrow>
                                       <m:mo>]</m:mo>
                                    </m:mrow>
                                 </m:mrow>
                              </m:mtd>
                           </m:mtr>
                        </m:mtable>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqabeWacaaabaGaemiuaaLaemOta40aaSbaaSqaaiabigdaXaqabaGccqGH9aqpdaWadaqaauaabaqaeqaaaaaabaGaeGimaaJaeiOla4IaeGOnayJaeGOmaiJaeGymaeJaeGynauJaeiilaWcabaGaeGimaaJaeiOla4IaeG4mamJaeG4naCJaeGioaGJaeGynauJaeiilaWcabaGaeGimaaJaeiOla4IaeGimaaJaeGimaaJaeGimaaJaeGymaeJaeiilaWcabaGaeGimaaJaeiOla4IaeGimaaJaeGimaaJaeGimaaJaeGymaedabaGaeGimaaJaeiOla4IaeGOnayJaeG4naCJaeGioaGJaeGioaGJaeiilaWcabaGaeGimaaJaeiOla4IaeG4mamJaeGOmaiJaeGymaeJaeGOmaiJaeiilaWcabaGaeGimaaJaeiOla4IaeGimaaJaeGimaaJaeGimaaJaeGymaeJaeiilaWcabaGaeGimaaJaeiOla4IaeGimaaJaeGimaaJaeGimaaJaeGymaedabaGaeGymaeJamaiGamaaaiilaWcabaGaeGimaaJaeiOla4IaeGimaaJaeGimaaJaeGimaaJaeGymaeJaeiilaWcabaGaeGimaaJaeiOla4IaeGimaaJaeGimaaJaeGimaaJaeGymaeJaeiilaWcabaGaeGimaaJaeiOla4IaeGimaaJaeGimaaJaeGimaaJaeGymaedabiqaa83acqaIXaqmcWaGacTaaaGGSaalaeaacqaIWaamcqGGUaGlcqaIWaamcqaIWaamcqaIWaamcqaIXaqmcqGGSaalaeaacqaIWaamcqGGUaGlcqaIWaamcqaIWaamcqaIWaamcqaIXaqmcqGGSaalaeaacqaIWaamcqGGUaGlcqaIWaamcqaIWaamcqaIWaamcqaIXaqmaaaacaGLBbGaayzxaaGaeiilaWcabaaabaGaemiuaaLaemOta40aaSbaaSqaaiabikdaYaqabaGccqGH9aqpdaWadaqaauaabaqaeqaaaaaabaGaeGimaaJaeiOla4IaeGynauJaeGOmaiJaeGOmaiJaeGyoaKJaeiilaWcabaGaeGimaaJaeiOla4IaeGinaqJaeGinaqJaeGOnayJaeGinaqJaeiilaWcabaGaeGimaaJamaiG4kaaaiilaWcabaGaeGimaaJaeiOla4IaeGimaaJaeG4mamJaeGimaaJaeG4naCdabaGaeGimaaJaeiOla4IaeGOnayJaeGimaaJaeGimaaJaeG4mamJaeiilaWcabaGaeGimaaJaeiOla4IaeGOmaiJaeGOnayJaeG4mamJaeG4naCJaeiilaWcabaGaeGimaaJaeiOla4IaeGimaaJaeGinaqJaeGinaqJaeGymaeJaeiilaWcabaGaeGimaaJaeiOla4IaeGimaaJaeGyoaKJaeGOmaiJaeGimaadabaGaeGymaeJamaiGqlaaaiilaWcabaGaeGimaaJaeiOla4IaeGimaaJaeGimaaJaeGimaaJaeGymaeJaeiilaWcabaGaeGimaaJaeiOla4IaeGimaaJaeGimaaJaeGimaaJaeGymaeJaeiilaWcabaGaeGimaaJaeiOla4IaeGimaaJaeGimaaJaeGimaaJaeGymaedabaGaeGymaeJamaiGOlaaaiilaWcabaGaeGimaaJaeiOla4IaeGimaaJaeGimaaJaeGimaaJaeGymaeJaeiilaWcabaGaeGimaaJaeiOla4IaeGimaaJaeGimaaJaeGimaaJaeGymaeJaeiilaWcabaGaeGimaaJaeiOla4IaeGimaaJaeGimaaJaeGimaaJaeGymaedaaaGaay5waiaaw2faaiabcYcaSaqaaiabdcfaqjabd6eaonaaBaaaleaacqaIZaWmaeqaaOGaeyypa0ZaamWaaeaafaqaaeabeaaaaaqaaiabicdaWiabc6caUiabiwda1iabisda0iabiEda3iabicdaWiabcYcaSaqaaiabicdaWiabc6caUiabisda0iabikdaYiabiwda1iabiIda4iabcYcaSaqaaiabicdaWiadaci9caaacYcaSaqaaiabicdaWiabc6caUiabicdaWiabikdaYiabiEda3iabikdaYaqaaiabicdaWiabc6caUiabiAda2iabiEda3iabicdaWiabicdaWiabcYcaSaqaaiabicdaWiabc6caUiabikdaYiabiodaZiabiEda3iabiEda3iabcYcaSaqaaiabicdaWiabc6caUiabicdaWiabisda0iabigdaXiabiIda4iabcYcaSaqaaiabicdaWiabc6caUiabicdaWiabiwda1iabicdaWiabiwda1aqaaiabigdaXiadaciadaaacYcaSaqaaiabicdaWiabc6caUiabicdaWiabicdaWiabicdaWiabigdaXiabcYcaSaqaaiabicdaWiabc6caUiabicdaWiabicdaWiabicdaWiabigdaXiabcYcaSaqaaiabicdaWiabc6caUiabicdaWiabicdaWiabicdaWiabigdaXaqaaiabigdaXiadacigdaaacYcaSaqaaiabicdaWiabc6caUiabicdaWiabicdaWiabicdaWiabigdaXiabcYcaSaqaaiabicdaWiabc6caUiabicdaWiabicdaWiabicdaWiabigdaXiabcYcaSaqaaiabicdaWiabc6caUiabicdaWiabicdaWiabicdaWiabigdaXaaaaiaawUfacaGLDbaaaeaacqWGqbaucqWGobGtdaWgaaWcbaGaeGinaqdabeaakiabg2da9maadmaabaqbaeaabqabaaaaaeaacqaIWaamcqGGUaGlcqaI2aGncqaIXaqmcqaI4aaocqaIXaqmcqGGSaalaeaacqaIWaamcqGGUaGlcqaIZaWmcqaI1aqncqaI4aaocqaI0aancqGGSaalaeaacqaIWaamcqGGUaGlcqaIWaamcqaIWaamcqaIWaamcqaIXaqmcqGGSaalaeaacqaIWaamcqGGUaGlcqaIWaamcqaIYaGmcqaIZaWmcqaI1aqnaeaacqaIWaamcqGGUaGlcqaI3aWncqaI1aqncqaIZaWmcqaI3aWncqGGSaalaeaacqaIWaamcqGGUaGlcqaIYaGmcqaI0aancqaI2aGncqaIZaWmcqGGSaalaeaacqaIWaamcqGGUaGlcqaIWaamcqaIWaamcqaIWaamcqaIXaqmcqGGSaalaeaacqaIWaamcqGGUaGlcqaIWaamcqaIWaamcqaIWaamcqaIXaqmaeaacqaIXaqmcWaGacWaaaGGSaalaeaacqaIWaamcqGGUaGlcqaIWaamcqaIWaamcqaIWaamcqaIXaqmcqGGSaalaeaacqaIWaamcqGGUaGlcqaIWaamcqaIWaamcqaIWaamcqaIXaqmcqGGSaalaeaacqaIWaamcqGGUaGlcqaIWaamcqaIWaamcqaIWaamcqaIXaqmaeaacqaIXaqmcWaGaIXaaaGGSaalaeaacqaIWaamcqGGUaGlcqaIWaamcqaIWaamcqaIWaamcqaIXaqmcqGGSaalaeaacqaIWaamcqGGUaGlcqaIWaamcqaIWaamcqaIWaamcqaIXaqmcqGGSaalaeaacqaIWaamcqGGUaGlcqaIWaamcqaIWaamcqaIWaamcqaIXaqmaaaacaGLBbGaayzxaaGaeiilaWcabaGaemiuaaLaemOta40aaSbaaSqaaiabiwda1aqabaGccqGH9aqpdaWadaqaauaabaqaeqaaaaaabaGaeGimaaJaeiOla4IaeGynauJaeGynauJaeGynauJaeGimaaJaeiilaWcabaGaeGimaaJaeiOla4IaeGinaqJaeGOmaiJaeGymaeJaeGOmaiJaeiilaWcabaGaeGimaaJaeiOla4IaeGimaaJaeGimaaJaeGimaaJaeGymaeJaeiilaWcabaGaeGimaaJaeiOla4IaeGimaaJaeGOmaiJaeG4mamJaeGioaGdabaGaeGimaaJaeiOla4IaeGynauJaeGOmaiJaeG4naCJaeG4naCJaeiilaWcabaGaeGimaaJaeiOla4IaeGinaqJaeG4naCJaeGOmaiJaeG4mamJaeiilaWcabaGaeGimaaJaeiOla4IaeGimaaJaeGimaaJaeGimaaJaeGymaeJaeiilaWcabaGaeGimaaJaeiOla4IaeGimaaJaeGimaaJaeGimaaJaeGymaedabaGaeGymaeJamaiGymaaaiilaWcabaGaeGimaaJaeiOla4IaeGimaaJaeGimaaJaeGimaaJaeGymaeJaeiilaWcabaGaeGimaaJaeiOla4IaeGimaaJaeGimaaJaeGimaaJaeGymaeJaeiilaWcabaGaeGimaaJaeiOla4IaeGimaaJaeGimaaJaeGimaaJaeGymaedabaGaeGymaeJamaiGWmaaaiilaWcabaGaeGimaaJaeiOla4IaeGimaaJaeGimaaJaeGimaaJaeGymaeJaeiilaWcabaGaeGimaaJaeiOla4IaeGimaaJaeGimaaJaeGimaaJaeGymaeJaeiilaWcabaGaeGimaaJaeiOla4IaeGimaaJaeGimaaJaeGimaaJaeGymaedaaaGaay5waiaaw2faaaaaaaa@1DE0@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>Therefore, for a certain hexamer sequence <it>S </it>= <it>s</it><sub>1</sub><it>s</it><sub>2</sub><it>s</it><sub>3</sub><it>s</it><sub>4</sub><it>s</it><sub>5</sub><it>s</it><sub>6 </sub>we can calculate the probability of the NUE signal at the S position:</p>
            <p><it>P</it>[<it>S </it>| <it>NUE</it>] = <it>PN</it><sub>0</sub>(<it>s</it><sub>1</sub>)<it>*PN</it><sub>1</sub>(<it>s</it><sub>2</sub>|<it>s</it><sub>1</sub>)*<it>PN</it><sub>2</sub>(<it>s</it><sub>3</sub>|<it>s</it><sub>2</sub>)*<it>PN</it><sub>3</sub>(<it>s</it><sub>4</sub>|<it>s</it><sub>3</sub>)*<it>PN</it><sub>4</sub>(<it>s</it><sub>5</sub>|<it>s</it><sub>4</sub>)*<it>PN</it><sub>5</sub>(<it>s</it><sub>6</sub>|<it>s</it><sub>5</sub>).</p>
            <p>The same kind of first order inhomogeneous Markov model was established for the poly(A) site signal (YA in the model, Fig. <figr fid="F6">6A</figr>). For this, we randomly selected 1000 sequences from the 8K dataset and obtained the poly(A) site parameters listed blow. Initiated state <it>CSPN</it><sub>0 </sub>= [0.0730,0.4630, 0.3250, 0.1390],</p>
            <p>2<sup>nd </sup>state<m:math name="1471-2105-8-43-i6" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msup><m:mn/><m:mrow><m:mtext/></m:mrow></m:msup><m:mtext>&#160;</m:mtext><m:mi>C</m:mi><m:mi>S</m:mi><m:mi>P</m:mi><m:msub><m:mi>N</m:mi><m:mn>1</m:mn></m:msub><m:mo>=</m:mo><m:mrow><m:mo>[</m:mo><m:mrow><m:mtable><m:mtr><m:mtd><m:mrow><m:mn>0.4247</m:mn><m:mo>,</m:mo></m:mrow></m:mtd><m:mtd><m:mrow><m:mn>0.3973</m:mn><m:mo>,</m:mo></m:mrow></m:mtd><m:mtd><m:mrow><m:mn>0.1096</m:mn><m:mo>,</m:mo></m:mrow></m:mtd><m:mtd><m:mrow><m:mn>0.0685</m:mn></m:mrow></m:mtd></m:mtr><m:mtr><m:mtd><m:mrow><m:mn>0.7171</m:mn><m:mo>,</m:mo></m:mrow></m:mtd><m:mtd><m:mrow><m:mn>0.1620</m:mn><m:mo>,</m:mo></m:mrow></m:mtd><m:mtd><m:mrow><m:mn>0.0648</m:mn><m:mo>,</m:mo></m:mrow></m:mtd><m:mtd><m:mrow><m:mn>0.0562</m:mn></m:mrow></m:mtd></m:mtr><m:mtr><m:mtd><m:mrow><m:mn>0.7785</m:mn><m:mo>,</m:mo></m:mrow></m:mtd><m:mtd><m:mrow><m:mn>0.1569</m:mn><m:mo>,</m:mo></m:mrow></m:mtd><m:mtd><m:mrow><m:mn>0.0431</m:mn><m:mo>,</m:mo></m:mrow></m:mtd><m:mtd><m:mrow><m:mn>0.0215</m:mn></m:mrow></m:mtd></m:mtr><m:mtr><m:mtd><m:mrow><m:mn>0.8489</m:mn></m:mrow></m:mtd><m:mtd><m:mrow><m:mn>0.0935</m:mn></m:mrow></m:mtd><m:mtd><m:mrow><m:mn>0.0144</m:mn></m:mrow></m:mtd><m:mtd><m:mrow><m:mn>0.0432</m:mn></m:mrow></m:mtd></m:mtr></m:mtable></m:mrow><m:mo>]</m:mo></m:mrow></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqaIYaGmdaahaaWcbeqaaiabb6gaUjabbsgaKbaakiabbccaGiabbohaZjabbsha0jabbggaHjabbsha0jabbwgaLjabbccaGiabdoeadjabdofatjabdcfaqjabd6eaonaaBaaaleaacqaIXaqmaeqaaOGaeyypa0ZaamWaaeaafaqabeabeaaaaaqaaiabicdaWiabc6caUiabisda0iabikdaYiabisda0iabiEda3iabcYcaSaqaaiabicdaWiabc6caUiabiodaZiabiMda5iabiEda3iabiodaZiabcYcaSaqaaiabicdaWiabc6caUiabigdaXiabicdaWiabiMda5iabiAda2iabcYcaSaqaaiabicdaWiabc6caUiabicdaWiabiAda2iabiIda4iabiwda1aqaaiabicdaWiabc6caUiabiEda3iabigdaXiabiEda3iabigdaXiabcYcaSaqaaiabicdaWiabc6caUiabigdaXiabiAda2iabikdaYiabicdaWiabcYcaSaqaaiabicdaWiabc6caUiabicdaWiabiAda2iabisda0iabiIda4iabcYcaSaqaaiabicdaWiabc6caUiabicdaWiabiwda1iabiAda2iabikdaYaqaaiabicdaWiabc6caUiabiEda3iabiEda3iabiIda4iabiwda1iabcYcaSaqaaiabicdaWiabc6caUiabigdaXiabiwda1iabiAda2iabiMda5iabcYcaSaqaaiabicdaWiabc6caUiabicdaWiabisda0iabiodaZiabigdaXiabcYcaSaqaaiabicdaWiabc6caUiabicdaWiabikdaYiabigdaXiabiwda1aqaaiabicdaWiabc6caUiabiIda4iabisda0iabiIda4iabiMda5aqaaiabicdaWiabc6caUiabicdaWiabiMda5iabiodaZiabiwda1aqaaiabicdaWiabc6caUiabicdaWiabigdaXiabisda0iabisda0aqaaiabicdaWiabc6caUiabicdaWiabisda0iabiodaZiabikdaYaaaaiaawUfacaGLDbaaaaa@A448@</m:annotation></m:semantics></m:math></p>
            <p>Based on the same method, we calculated a dimer sequence <it>S </it>= <it>s</it><sub>1 </sub><it>s</it><sub>2 </sub>and obtained the probability at S position: <it>P</it>[<it>S</it>|<it>CS</it>] = <it>CSPN</it><sub>0</sub>(<it>s</it><sub>1</sub>)* <it>CSPN</it><sub>1</sub>(<it>s</it><sub>2</sub>|<it>s</it><sub>1</sub>).</p>
         </sec>
         <sec>
            <st>
               <p>The background parameters</p>
            </st>
            <p>Apart from the signal regions, there is not much information on the background states. Therefore, the parameters for background states were relatively random. Based on this condition, we first analyzed several basic factors in the background and modified them accordingly. The nucleotide output probabilities of background states were calculated by counting the nucleotide distribution of the whole sequence. We tested the distribution of nucleotides in the region of -160 to +100 nt.</p>
            <p>The most important factor in the background state is the length between two signal states. Taking the background states near the poly(A) site as an example, Bg4 and Bg5 are located upstream and downstream of the poly(A) site, and both the CE-L and CE-R signal states could be about 10 nt distance from the poly(A) site. Therefore, the length of the background state can be set to a range with 0 nt to 10 nt. The maximum length <it>D </it>was set to 10. However, because the length of the background could change slightly, we set all the lengths to a uniform distribution, which can be calculated by <it>P</it><sub><it>i</it></sub>(<it>d</it>) = 1/(1+<it>D</it><sub><it>i</it></sub>), where <it>P</it><sub><it>i</it></sub>(<it>d</it>) is the probability of the length of the background <it>i</it><sup>th </sup>to be <it>d</it>, and <it>D</it><sub><it>i </it></sub>is the possible maximum length of the <it>i</it><sup>th </sup>background. All background lengths were set by this method. The initial possible maximum length of Bg1, Bg2, Bg3, Bg4, Bg5 and Bg6 were set to 100, 100, 20, 10, 10 and 15, respectively.</p>
            <p>To identify the background region, we needed to consider the near signal region of both sides. For example, Bg3 lies between NUE and CE-L signal states, the range of NUE state is 10~30 nt and CE-L signal region is from 1 to 10 nt. Therefore, the range of Bg3 could be set in the center of these two regions which is 6~20 nt. The distributions of the background region and nucleotide output probability <it>B </it>are listed in [Additional file <supplr sid="S5">5</supplr>].</p>
            <suppl id="S5">
               <title>
                  <p>Additional File 5</p>
               </title>
               <text>
                  <p>Statistical features of the background. Numeric data for setting background parameter.</p>
               </text>
               <file name="1471-2105-8-43-S5.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Formula for the output of scores</p>
            </st>
            <p>We applied a sliding 180 nt-wide window to calculate the output of scores for the sequences. For every nucleotide, our program computed a score in all windows that contained this nucleotide. The window slid along the entire sequence, combining values of forward-backward variables using the following equation for the output of the score at nucleotide <it>t</it>:</p>
            <p>
               <m:math name="1471-2105-8-43-i7" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mtable columnalign="left">
                        <m:mtr>
                           <m:mtd>
                              <m:mi>S</m:mi>
                              <m:mi>c</m:mi>
                              <m:mi>o</m:mi>
                              <m:mi>r</m:mi>
                              <m:mi>e</m:mi>
                              <m:mo stretchy="false">(</m:mo>
                              <m:mi>t</m:mi>
                              <m:mo stretchy="false">)</m:mo>
                              <m:mo>=</m:mo>
                              <m:munder>
                                 <m:mrow>
                                    <m:mi>max</m:mi>
                                    <m:mo>&#8289;</m:mo>
                                 </m:mrow>
                                 <m:mi>w</m:mi>
                              </m:munder>
                              <m:mrow>
                                 <m:mo>{</m:mo>
                                 <m:mrow>
                                    <m:mi>S</m:mi>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>t</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                 </m:mrow>
                                 <m:mo>}</m:mo>
                              </m:mrow>
                           </m:mtd>
                        </m:mtr>
                        <m:mtr>
                           <m:mtd>
                              <m:mi>S</m:mi>
                              <m:mo stretchy="false">(</m:mo>
                              <m:mi>t</m:mi>
                              <m:mo stretchy="false">)</m:mo>
                              <m:mo>=</m:mo>
                              <m:mrow>
                                 <m:mo>{</m:mo>
                                 <m:mrow>
                                    <m:msub>
                                       <m:mrow>
                                          <m:mi>log</m:mi>
                                          <m:mo>&#8289;</m:mo>
                                       </m:mrow>
                                       <m:mrow>
                                          <m:mn>10</m:mn>
                                       </m:mrow>
                                    </m:msub>
                                    <m:mi>P</m:mi>
                                    <m:msub>
                                       <m:mi>S</m:mi>
                                       <m:mrow>
                                          <m:mi>w</m:mi>
                                          <m:mo>,</m:mo>
                                          <m:mi>t</m:mi>
                                       </m:mrow>
                                    </m:msub>
                                    <m:mo>+</m:mo>
                                    <m:mn>120</m:mn>
                                 </m:mrow>
                                 <m:mo>}</m:mo>
                              </m:mrow>
                              <m:mo>/</m:mo>
                              <m:mn>2</m:mn>
                           </m:mtd>
                        </m:mtr>
                     </m:mtable>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakqaabeqaaiabdofatjabdogaJjabd+gaVjabdkhaYjabdwgaLjabcIcaOiabdsha0jabcMcaPiabg2da9maaxababaGagiyBa0MaeiyyaeMaeiiEaGhaleaacqWG3bWDaeqaaOWaaiWabeaacqWGtbWucqGGOaakcqWG0baDcqGGPaqkaiaawUhacaGL9baaaeaacqWGtbWucqGGOaakcqWG0baDcqGGPaqkcqGH9aqpdaGadeqaaiGbcYgaSjabc+gaVjabcEgaNnaaBaaaleaacqaIXaqmcqaIWaamaeqaaOGaemiuaaLaem4uam1aaSbaaSqaaiabdEha3jabcYcaSiabdsha0bqabaGccqGHRaWkcqaIXaqmcqaIYaGmcqaIWaamaiaawUhacaGL9baacqGGVaWlcqaIYaGmaaaa@5D81@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>where <it>w </it>is all of the windows that include nucleotide <it>t</it>; <it>PS</it><sub><it>w</it>,<it>t </it></sub>is the forward-backward algorithm's probability that nucleotide <it>t </it>is a poly(A) site in window <it>w</it>; the two constants, 120 and 2, are used to adjust the scores to be in a manageable range.</p>
         </sec>
         <sec>
            <st>
               <p>Calculation of sensitivity and specificity</p>
            </st>
            <p>The formulas for <it>Sp </it>and <it>Sn </it>calculations are given in the Results. The methods for the compiling false positive and false negative numbers are shown here. We employed a user defined value called <it>threshold </it>in these calculations. At a given threshold value (<it>t</it>), the score at an nt must be at least <it>t </it>in order for that nt to be a predicted poly(A) site. The False Positive sites (<it>FP</it>) were calculated as following: for a sequence of interest, let <it>n </it>represent the total number of nucleotides; let <it>p </it>represent the number of true poly(A) sites with a score equal or larger than a given <it>t</it>; let <it>m </it>represent the number of all sites with a score equal or larger than <it>t</it>. Then, <it>FP </it>= <it>m-p</it>. As one can see, using the 8K dataset sequences to calculate <it>FP </it>requires that all poly(A) sites have to be identified. Due to the fact that the identification of true poly(A) sites in a given 3'UTR is incomplete in plants (many alternative poly(A) sites may not be represented in the EST collection, or the dataset is not sufficiently large enough), sequences that are known to not possess poly(A) sites were used to tally <it>FP</it>. These sequences include protein-coding sequences, 5'-UTRs, and introns as indicated in "The Datasets" under Methods. Random sequences generated by preserving the trinucleotide distribution were also used. In these control datasets, <it>FP </it>= <it>m</it>. For True Positive sites (<it>TP</it>), at a given <it>t</it>, in the sequences with known poly(A) sites, <it>TP </it>= <it>p</it>. In the control sequences, <it>TP </it>= <it>n-m</it>. False negative (<it>FN</it>) is the number of actual sites that cannot be identified or predicted correctly. To calculate <it>FN</it>, let <it>f </it>represent the number of true poly(A) sites with a score smaller than a given <it>t</it>. Hence, at a given <it>t</it>, in the sequences with known poly(A) sites, <it>FN </it>= <it>f</it>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>CaMV, cauliflower mosaic virus</p>
         <p>CE, cleavage element</p>
         <p>CS, cleavage site</p>
         <p><it>FN</it>, false negative</p>
         <p><it>FP</it>, false positive</p>
         <p>FUE, far upstream element</p>
         <p>GHMM, Generalized Hidden Markov Model</p>
         <p>HMM, Hidden Markov Model</p>
         <p>mRNA, messenger RNA</p>
         <p>nt, nucleotide(s)</p>
         <p>NUE, near upstream element</p>
         <p><it>PASS</it>, Poly(A) Site Sleuth program</p>
         <p>poly(A), polyadenine</p>
         <p>pre-mRNA, precursor mRNA</p>
         <p><it>rbcS</it>, rubisco small subunit gene</p>
         <p><it>Sn</it>, sensitivity</p>
         <p><it>Sp</it>, specificity</p>
         <p>TAIR, the Arabidopsis Information Resources <abbrgrp><abbr bid="B18">18</abbr></abbrgrp></p>
         <p><it>TP</it>, true positive</p>
         <p>UTR, untranslated region(s)</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>GJ and QQL were responsible for the strategy, coordination, implementation of the project and manuscript preparation. JZ, RJ and YL implemented the algorithm, development of the code, and initial testing. XW modified the final code made it more efficient and helped revising the manuscript. YS and KMD tested the program with different genes and generated the output figures. JCL generated the signal length data. GJR implemented the algorithm in C++ which was used for testing, and participated manuscript writing.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The authors wish to thank Brian Haas for the 8K dataset, Chun Liang for critically reading the manuscript, and Henry Wan for helpful discussions. We also appreciate two anonymous reviewers whose comments made this paper better. This project was supported in part by grants from the Ohio Plant Biotechnology Consortium, and from US National Science Foundation (MCB-0313472), both to QQL.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>New perspectives on connecting messenger RNA 3' end formation to transcription</p>
            </title>
            <aug>
               <au>
                  <snm>Proudfoot</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Curr Opin Cell Biol</source>
            <pubdate>2004</pubdate>
            <volume>16</volume>
            <issue>3</issue>
            <fpage>272</fpage>
            <lpage>278</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.ceb.2004.03.007</pubid>
                  <pubid idtype="pmpid" link="fulltext">15145351</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>The polyadenylation of RNA in plants</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>QQ</fnm>
               </au>
               <au>
                  <snm>Hunt</snm>
                  <fnm>AG</fnm>
               </au>
            </aug>
            <source>Plant Physiol</source>
            <pubdate>1997</pubdate>
            <volume>115</volume>
            <fpage>321</fpage>
            <lpage>325</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">158489</pubid>
                  <pubid idtype="pmpid" link="fulltext">12223809</pubid>
                  <pubid idtype="doi">10.1104/pp.115.2.321</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Formation of mRNA 3' ends in eukaryotes: mechanism, regulation, and interrelationships with other steps in mRNA synthesis</p>
            </title>
            <aug>
               <au>
                  <snm>Zhao</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hyman</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Moore</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Microbiol Mol Biol Rev</source>
            <pubdate>1999</pubdate>
            <volume>63</volume>
            <issue>2</issue>
            <fpage>405</fpage>
            <lpage>445</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">98971</pubid>
                  <pubid idtype="pmpid" link="fulltext">10357856</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Bioinformatic identification of candidate cis-regulatory elements involved in human mRNA polyadenylation</p>
            </title>
            <aug>
               <au>
                  <snm>Hu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lutz</snm>
                  <fnm>CS</fnm>
               </au>
               <au>
                  <snm>Wilusz</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Tian</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>RNA</source>
            <pubdate>2005</pubdate>
            <volume>11</volume>
            <issue>10</issue>
            <fpage>1485</fpage>
            <lpage>1493</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1370832</pubid>
                  <pubid idtype="pmpid" link="fulltext">16131587</pubid>
                  <pubid idtype="doi">10.1261/rna.2107305</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Probabilistic prediction of Saccharomyces cerevisiae mRNA 3'-processing sites</p>
            </title>
            <aug>
               <au>
                  <snm>Graber</snm>
                  <fnm>JH</fnm>
               </au>
               <au>
                  <snm>McAllister</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>TF</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <issue>8</issue>
            <fpage>1851</fpage>
            <lpage>1858</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">113205</pubid>
                  <pubid idtype="pmpid" link="fulltext">11937640</pubid>
                  <pubid idtype="doi">10.1093/nar/30.8.1851</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Compilation of mRNA Polyadenylation Signals in Arabidopsis Revealed a New Signal Element and Potential Secondary Structures</p>
            </title>
            <aug>
               <au>
                  <snm>Loke</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Stahlberg</snm>
                  <fnm>EA</fnm>
               </au>
               <au>
                  <snm>Strenski</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Haas</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Wood</snm>
                  <fnm>PC</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>QQ</fnm>
               </au>
            </aug>
            <source>Plant Physiol</source>
            <pubdate>2005</pubdate>
            <volume>138</volume>
            <fpage>1457</fpage>
            <lpage>1468</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1176417</pubid>
                  <pubid idtype="pmpid" link="fulltext">15965016</pubid>
                  <pubid idtype="doi">10.1104/pp.105.060541</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>A near upstream element in a plant polyadenylation signal consists of more than six bases</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>QQ</fnm>
               </au>
               <au>
                  <snm>Hunt</snm>
                  <fnm>AG</fnm>
               </au>
            </aug>
            <source>Plant Mol Biol</source>
            <pubdate>1995</pubdate>
            <volume>28</volume>
            <fpage>927</fpage>
            <lpage>934</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF00042076</pubid>
                  <pubid idtype="pmpid">7640363</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Features of Arabidopsis genes and genome discovered using full-length cDNAs</p>
            </title>
            <aug>
               <au>
                  <snm>Alexandrov</snm>
                  <fnm>NN</fnm>
               </au>
               <au>
                  <snm>Troukhan</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Brover</snm>
                  <fnm>VV</fnm>
               </au>
               <au>
                  <snm>Tatarinova</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Flavell</snm>
                  <fnm>RB</fnm>
               </au>
               <au>
                  <snm>Feldmann</snm>
                  <fnm>KA</fnm>
               </au>
            </aug>
            <source>Plant Molecular Biology</source>
            <pubdate>2006</pubdate>
            <volume>60</volume>
            <issue>1</issue>
            <fpage>69</fpage>
            <lpage>85</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s11103-005-2564-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">16463100</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Sequence analysis of mRNA polyadenylation signals of rice genes</p>
            </title>
            <aug>
               <au>
                  <snm>Lu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Gao</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Han</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Chinese Science Bulletin</source>
            <pubdate>2006</pubdate>
            <volume>51</volume>
            <issue>9</issue>
            <fpage>1069 </fpage>
            <lpage>11077</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1007/s11434-006-1069-5</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition</p>
            </title>
            <aug>
               <au>
                  <snm>Rabiner</snm>
                  <fnm>LR</fnm>
               </au>
            </aug>
            <source>Proceedings IEEE</source>
            <pubdate>1989</pubdate>
            <volume>2</volume>
            <fpage>257</fpage>
            <lpage>286</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1109/5.18626</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>GeneMark.hmm: new solutions for gene finding</p>
            </title>
            <aug>
               <au>
                  <snm>Lukashin</snm>
                  <fnm>AV</fnm>
               </au>
               <au>
                  <snm>Borodovsky</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1998</pubdate>
            <volume>26</volume>
            <fpage>1107</fpage>
            <lpage>1115</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">147337</pubid>
                  <pubid idtype="pmpid" link="fulltext">9461475</pubid>
                  <pubid idtype="doi">10.1093/nar/26.4.1107</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>State duration modelling in hidden Markov models</p>
            </title>
            <aug>
               <au>
                  <snm>Vaseghi</snm>
                  <fnm>SV</fnm>
               </au>
            </aug>
            <source>Signal Processing</source>
            <pubdate>1995</pubdate>
            <volume>41</volume>
            <fpage>31</fpage>
            <lpage>41</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/0165-1684(94)00088-H</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Extreme heterogeneity of polyadenylation sites in mRNAs encoding chloroplast RNA-binding proteins in Nicotiana plumbaginifolia</p>
            </title>
            <aug>
               <au>
                  <snm>Klahre</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Hemmings-Mieszczak</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Filipowicz</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Plant Mol Biol</source>
            <pubdate>1995</pubdate>
            <volume>28</volume>
            <issue>3</issue>
            <fpage>569</fpage>
            <lpage>574</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF00020402</pubid>
                  <pubid idtype="pmpid">7632924</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>The contribution of AAUAAA and the upstream element UUUGUA to the efficiency of mRNA 3'-end formation in plants</p>
            </title>
            <aug>
               <au>
                  <snm>Rothnie</snm>
                  <fnm>HM</fnm>
               </au>
               <au>
                  <snm>Reid</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hohn</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>EMBO J</source>
            <pubdate>1994</pubdate>
            <volume>13</volume>
            <issue>9</issue>
            <fpage>2200</fpage>
            <lpage>2210</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">395075</pubid>
                  <pubid idtype="pmpid">8187773</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Design and construction of a versatile system for the expression of foreign genes in plants.</p>
            </title>
            <aug>
               <au>
                  <snm>Schardl</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>Byrd</snm>
                  <fnm>AD</fnm>
               </au>
               <au>
                  <snm>Benzion</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Altschuler</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Hildebrand</snm>
                  <fnm>DF</fnm>
               </au>
               <au>
                  <snm>Hunt</snm>
                  <fnm>AG</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>1987</pubdate>
            <volume>61</volume>
            <fpage>1</fpage>
            <lpage>11</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0378-1119(87)90359-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">3443303</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>A gateway cloning vector set for high-throughput functional analysis of genes in planta.</p>
            </title>
            <aug>
               <au>
                  <snm>Curtis</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Grossniklaus</snm>
                  <fnm>U</fnm>
               </au>
            </aug>
            <source>Plant Physiol</source>
            <pubdate>2003</pubdate>
            <volume>133</volume>
            <fpage>462</fpage>
            <lpage>469</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">523872</pubid>
                  <pubid idtype="pmpid" link="fulltext">14555774</pubid>
                  <pubid idtype="doi">10.1104/pp.103.027979</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Several distinct types of sequence elements are required for efficient mRNA 3' end formation in a pea rbcS gene</p>
            </title>
            <aug>
               <au>
                  <snm>Mogen</snm>
                  <fnm>BD</fnm>
               </au>
               <au>
                  <snm>MacDonald</snm>
                  <fnm>MH</fnm>
               </au>
               <au>
                  <snm>Leggewie</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Hunt</snm>
                  <fnm>AG</fnm>
               </au>
            </aug>
            <source>Mol Cell Biol</source>
            <pubdate>1992</pubdate>
            <volume>12</volume>
            <issue>12</issue>
            <fpage>5406</fpage>
            <lpage>5414</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">360478</pubid>
                  <pubid idtype="pmpid">1448074</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>The Arabidopsis Information Resources [www.arabidopsis.org]</p>
            </title>
            <aug>
               <au>
                  <cnm>TAIR</cnm>
               </au>
            </aug>
         </bibl>
         <bibl id="B19">
            <title>
               <p>GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions</p>
            </title>
            <aug>
               <au>
                  <snm>Besemer</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lomsadze</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Borodovsky</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <issue>12</issue>
            <fpage>2607</fpage>
            <lpage>2618</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">55746</pubid>
                  <pubid idtype="pmpid" link="fulltext">11410670</pubid>
                  <pubid idtype="doi">10.1093/nar/29.12.2607</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Prediction of complete gene structures in human genomic DNA</p>
            </title>
            <aug>
               <au>
                  <snm>Burge</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Karlin</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1997</pubdate>
            <volume>268</volume>
            <issue>1</issue>
            <fpage>78</fpage>
            <lpage>94</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1997.0951</pubid>
                  <pubid idtype="pmpid" link="fulltext">9149143</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Two methods for improving performance of an HMM and their application for gene finding</p>
            </title>
            <aug>
               <au>
                  <snm>Krogh</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Proc Int Conf Intell Syst Mol Biol</source>
            <pubdate>1997</pubdate>
            <volume>5</volume>
            <fpage>179</fpage>
            <lpage>186</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9322033</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>An in-silico method for prediction of polyadenylation signals in human sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Han</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Genome Inform Ser Workshop Genome Inform</source>
            <pubdate>2003</pubdate>
            <volume>14</volume>
            <fpage>84</fpage>
            <lpage>93</lpage>
         </bibl>
         <bibl id="B23">
            <title>
               <p>A large-scale analysis of mRNA polyadenylation of human and mouse genes</p>
            </title>
            <aug>
               <au>
                  <snm>Tian</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Hu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Lutz</snm>
                  <fnm>CS</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <issue>1</issue>
            <fpage>201</fpage>
            <lpage>212</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">546146</pubid>
                  <pubid idtype="pmpid" link="fulltext">15647503</pubid>
                  <pubid idtype="doi">10.1093/nar/gki158</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Prediction of mRNA polyadenylation sites by support vector machine</p>
            </title>
            <aug>
               <au>
                  <snm>Cheng</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Miura</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Tian</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <issue>19</issue>
            <fpage>2320</fpage>
            <lpage>2325</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btl394</pubid>
                  <pubid idtype="pmpid" link="fulltext">16870936</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Analysis of the transcriptional complexity of Arabidopsis thaliana by massively parallel signature sequencing</p>
            </title>
            <aug>
               <au>
                  <snm>Meyers</snm>
                  <fnm>BC</fnm>
               </au>
               <au>
                  <snm>Vu</snm>
                  <fnm>TH</fnm>
               </au>
               <au>
                  <snm>Tej</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Ghazal</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Matvienko</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Agrawal</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Ning</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Haudenschild</snm>
                  <fnm>CD</fnm>
               </au>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2004</pubdate>
            <volume>22</volume>
            <issue>8</issue>
            <fpage>1006</fpage>
            <lpage>1011</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nbt992</pubid>
                  <pubid idtype="pmpid" link="fulltext">15247925</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Web site to download PASS</p>
            </title>
            <aug>
               <au>
                  <cnm>PASS</cnm>
               </au>
            </aug>
            <url>http://www.polyA.org</url>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Plant mRNA 3'-end formation</p>
            </title>
            <aug>
               <au>
                  <snm>Rothnie</snm>
                  <fnm>HM</fnm>
               </au>
            </aug>
            <source>Plant Mol Biol</source>
            <pubdate>1996</pubdate>
            <volume>32</volume>
            <issue>1-2</issue>
            <fpage>43</fpage>
            <lpage>61</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF00039376</pubid>
                  <pubid idtype="pmpid">8980473</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Upstream sequences other than AAUAAA are required for efficient messenger RNA 3'-end formation in plants</p>
            </title>
            <aug>
               <au>
                  <snm>Mogen</snm>
                  <fnm>BD</fnm>
               </au>
               <au>
                  <snm>MacDonald</snm>
                  <fnm>MH</fnm>
               </au>
               <au>
                  <snm>Graybosch</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hunt</snm>
                  <fnm>AG</fnm>
               </au>
            </aug>
            <source>Plant Cell</source>
            <pubdate>1990</pubdate>
            <volume>2</volume>
            <issue>12</issue>
            <fpage>1261</fpage>
            <lpage>1272</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">159971</pubid>
                  <pubid idtype="pmpid" link="fulltext">1983794</pubid>
                  <pubid idtype="doi">10.1105/tpc.2.12.1261</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
