<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-9-381</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Software</dochead>
      <bibl>
         <title>
            <p>MetWAMer: eukaryotic translation initiation site prediction</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Sparks</snm>
               <mi>E</mi>
               <fnm>Michael</fnm>
               <insr iid="I1"/>
               <email>mespar1@iastate.edu</email>
            </au>
            <au id="A2" ca="yes">
               <snm>Brendel</snm>
               <fnm>Volker</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>vbrendel@iastate.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Statistics, Iowa State University, Ames, IA 50011, USA</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2008</pubdate>
         <volume>9</volume>
         <issue>1</issue>
         <fpage>381</fpage>
         <url>http://www.biomedcentral.com/1471-2105/9/381</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18801175</pubid>
               <pubid idtype="doi">10.1186/1471-2105-9-381</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>06</day>
               <month>3</month>
               <year>2008</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>18</day>
               <month>9</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>18</day>
               <month>9</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Sparks and Brendel; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Translation initiation site (TIS) identification is an important aspect of the gene annotation process, requisite for the accurate delineation of protein sequences from transcript data. We have developed the MetWAMer package for TIS prediction in eukaryotic open reading frames of non-viral origin. MetWAMer can be used as a stand-alone, third-party tool for post-processing gene structure annotations generated by external computational programs and/or pipelines, or directly integrated into gene structure prediction software implementations.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>MetWAMer currently implements five distinct methods for TIS prediction, the most accurate of which is a routine that combines weighted, signal-based translation initiation site scores and the contrast in coding potential of sequences flanking TISs using a perceptron. Also, our program implements clustering capabilities through use of the <it>k</it>-medoids algorithm, thereby enabling cluster-specific TIS parameter utilization. In practice, our static weight array matrix-based indexing method for parameter set lookup can be used with good results in data sets exhibiting moderate levels of 5'-complete coverage.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>We demonstrate that improvements in statistically-based models for TIS prediction can be achieved by taking the class of each potential start-methionine into account pending certain testing conditions, and that our perceptron-based model is suitable for the TIS identification task. MetWAMer represents a well-documented, extensible, and freely available software system that can be readily re-trained for differing target applications and/or extended with existing and novel TIS prediction methods, to support further research efforts in this area.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Translation initiation in eukaryotic mRNA molecules typically follows the basic mechanism postulated by the scanning hypothesis <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, according to which the 40S ribosomal subunit binds to the 5'-cap of an mRNA, scans in the 5' &#8594; 3' direction until the first AUG is encountered, stalls to recruit the 60S subunit, and forms the 80S ribosomal particle, which then proceeds unencumbered with translation to render a protein product (reviewed in <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>). Roughly 10% of eukaryotic transcripts are subject to so-called leaky scanning <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, in which the ribosome continues scanning beyond the first AUG codon until it encounters one in a more favorable context <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. Alternative methods to initiate translation from certain RNAs of viral origin exist, including, one, the formation of kissing stem-loops to facilitate translation initiation from a 5'-proximal methionine codon <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> and, two, usage of internal ribosomal entry sites <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Efficient translation initiation from non-methionine codons is also possible in eukaryotes <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>. In the present work, we are concerned only with modeling 5'-cap-dependent translation initiation occurring at AUG codons in eukaryotic protein coding genes of non-viral origin.</p>
         <p>A variety of approaches to <it>in silico </it>translation initiation site (TIS) detection in nucleotide sequences have been previously considered, including perceptrons <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, single, multilayer artificial neural networks (ANNs) <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, multiple, multilayer ANNs <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, linear discriminant analysis <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, mixture Gaussian models <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, unsupervised clustering algorithms <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, support vector machines <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>, expectation maximization <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, and hidden Markov models <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. Unfortunately, none of these methods are conveniently available in the form of open source, distributed software. In part, our motivation for this work is to provide a software framework for the implementation and testing of a variety of different algorithmic approaches to TIS identification. Software systems such as ESTScan <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp> and Diogenes <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>, originally developed for detecting significant open reading frames in (potentially errant) cDNA sequences, have also been used to identify TISs, although empirical results suggest that these methods are inappropriate for the task <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. One strategy for integrating TIS detection methods into computational gene finding pipelines, as opposed to predicting TISs in mRNA sequences <it>per se</it>, is to refine results produced from a separate gene finding tool. For example, the TICO tool <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B24">24</abbr></abbrgrp> was developed to refine prokaryotic gene structure annotations generated by the GLIMMER program <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr></abbrgrp>. The mechanism of translation initiation in prokaryotes differs considerably from that of eukaryotes <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. Here, we describe the MetWAMer system, developed primarily for post-processing spliced alignment-based eukaryotic gene annotation results provided in the gthXML format <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. A variant of the MetWAMer code is abstracted from any specific gene prediction system and allows TIS prediction in eukaryotic reading frames as generated by any procedure, thus facilitating integration into other gene prediction software and workflows.</p>
         <p>In the following we first describe MetWAMer and its incorporated TIS-finding algorithms and then discuss applications to annotating transcripts from the model plant <it>Arabidopsis thaliana</it>. MetWAMer currently implements five distinct methods for TIS detection. Among these, the best performer is the perceptron-based flank-contrasting weighted log-likelihood ratio routine (PFCWLLKR), which combines local TIS feature scores and scores probing the contrast in coding potential of sequences flanking a site. MetWAMer allows the user to develop and apply stratified parameter sets for an arbitrary number of data clusters. We demonstrate the potential for stratified parameter deployment to yield considerable increases in TIS prediction accuracy relative to a homogeneous parameter strategy. Also discussed are strategies for parameter selection in practice, depending on prior assessment of the likelihood that the transcript under consideration is or is not 5'-complete. Source code implementing this package is released under the ISC license, and is available for download from <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. It is also registered as Additional File <supplr sid="S1">1</supplr> in this report.</p>
         <suppl id="S1">
            <title>
               <p>Additional file 1</p>
            </title>
            <text>
               <p><b>MetWAMer.v1.3.</b> Source code for the MetWAMer package. This version was used to generate data reported in this study.</p>
            </text>
            <file name="1471-2105-9-381-S1.bz2">
               <p>Click here for file</p>
            </file>
         </suppl>
      </sec>
      <sec>
         <st>
            <p>Implementation</p>
         </st>
         <p>In the following subsection, we briefly describe the components of the MetWAMer software. Then we discuss the distinct algorithms implemented for TIS-identification and report our training and testing approaches for <it>Arabidopsis </it>data.</p>
         <sec>
            <st>
               <p>The <it>MetWAMer </it>system</p>
            </st>
            <p>The MetWAMer code, written in the C programming language, implements the executable files MetWAMer.CDS and MetWAMer.gthXML. MetWAMer.CDS is the generic application for TIS prediction in eukaryotic open reading frames, as derived via any computational procedure. MetWAMer.gthXML is a special-purpose variant of the software, specifically tailored to refine gene structure predictions generated by the GenomeThreader <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> and GeneSeqer <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> programs for spliced alignment-based gene structure annotation. GenomeThreader and GeneSeqer, like most other spliced-alignment tools, do not make explicit predictions concerning translation initiation sites, but rather are restricted to the identification of reading frames in genomic sequences for which transcript evidence or homologous sequences suggest a protein coding function. MetWAMer.gthXML extends the 5'- and 3'-most termini of these annotated reading frames such that a maximal (non-stop) open reading frame (ORF) is realized. (No distinction between MetWAMer.gthXML and the more generic MetWAMer.CDS variant exists subsequent to reading frame maximization; we therefore refer to the system as "MetWAMer" for the remainder of this article.) MetWAMer scans for methionine-encoding sites in this maximal reading frame, considering their potential as translation initiation sites under a variety of scoring schemes, described below, in an attempt to identify a TIS for the gene structure under consideration. At most one prediction per maximal ORF is made, if and only if the optimal solution rendered exceeds some method-specific quality threshold.</p>
            <p>Common to all detection methods implemented in MetWAMer is utilization of a start-methionine signal-specific weight array matrix (WAM) that records position-specific base transition frequencies proximal to methionine codons in protein coding sequences. Here, WAMs characterize position-specific dinucleotide abundances; see the <b>Stratified training and testing </b>section below for a more detailed description. The train_MetWAM utility from the MetWAMer package can be used to develop such a WAM, given appropriate training data. The first in-frame methionine codon encountered, subsequent to a specified offset in the training sequence, is considered to be the true TIS for that sequence. Training of the methionine weight array matrix proceeds by tabulating dinucleotide frequencies from five positions upstream of the adenine through three positions downstream of the guanine residue of the pertinent methionine codon. Next, the system advances 105 bases in the training instance, to resume scanning for in-frame methionine codons, each of which will be classified as a false TIS; dinucleotide frequencies proximal to these false TISs are tabulated in the same manner as true TISs (see Figure <figr fid="F1">1</figr>). Following tabulation of dinucleotide frequencies at true and false TISs in training data, these are converted to relative frequencies, yielding the WAM, which enables calculation of the likelihood that a site in question is a true or false TIS.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Extraction of training data</p>
               </caption>
               <text>
                  <p><b>Extraction of training data</b>. A genomic protein coding sequence is conceptually spliced into an open reading frame, which is extended at its 5'- and 3'-termini to render a maximal (non-stop) reading frame. For LLKR, WLLKR, and BAYES, only sequences comprising the immediate context of true and false TISs (defined as five bases upstream through three bases downstream of the ATG codon's adenine residue) are extracted for modeling the TIS signal. For flank-contrasting methods, both TIS contexts and flanking sequences (96 nt in length per flank) are extracted for training signal and content sensors, respectively. A minimal distance between true and false TISs of 105 nt is used.</p>
               </text>
               <graphic file="1471-2105-9-381-1"/>
            </fig>
            <sec>
               <st>
                  <p>Methionine log-likelihood ratios</p>
               </st>
               <p>The log-likelihood ratio (LLKR) approach to TIS prediction functions by scanning the ORF for in-frame ATG codons. (We use ATG to denote a methionine codon, as opposed to AUG, because MetWAMer scans for potential TISs in conceptually spliced genomic sequences.) A constraint is imposed on the protein length implied by any potential start-methionine such that if the ATG served as a true translation initiation site, the resulting protein must exceed 50 amino acid residues. Using the trained methionine-WAM, the method scores each such feasible site by calculating the likelihood that it is a true initiation site and taking the ratio of this value relative to the likelihood that it is not a true start site. The system identifies the methionine codon yielding the optimal value among such likelihood ratios, and provided the log of this ratio is non-negative, the LLKR routine returns it as the predicted start-methionine. The non-negativity constraint implements a classification threshold, imposed because we require the likelihood of the potential start site to favor its actually being a true TIS. If the system fails to identify any in-frame ATG codons, or the best-scoring site's score is negative-valued, then LLKR returns no prediction for the maximal ORF being surveyed.</p>
            </sec>
            <sec>
               <st>
                  <p>Weighted methionine log-likelihood ratios</p>
               </st>
               <p>The weighted log-likelihood ratio approach (WLLKR) is identical to LLKR, but each in-frame ATG's log-likelihood ratio score is scaled as a function of the induced protein product's coverage of the maximal ORF. Precisely, coverage <it>x </it>is defined as the ratio of the length of the implied amino acid chain starting from the TIS under consideration over the length of the maximal ORF. For a true TIS, we expect the coverage value to be close to unity, as it would be unusual for a long, uninterrupted reading frame to be evolutionarily maintained in a genome, yet not be encoding an expressed, functional protein product. Empirically, we settled on weights calculated as <it>w</it>(<it>x</it>) = <it>x</it><sup>3 </sup>(other convex functions give commensurate results). The WLLKR routine optimizes over weighted log-likelihood ratios for all in-frame ATG codons, returning a predicted start-methionine if and only if the optimal such value is non-negative.</p>
            </sec>
            <sec>
               <st>
                  <p>Multiplicative-based flank-contrasting with weighted methionine log-likelihood ratios</p>
               </st>
               <p>MetWAMer also implements an approach to start-methionine prediction that considers two descriptive features of potential TISs: weighted methionine log-likelihood ratio scores as used by the WLLKR routine (signal sensing) and the ratio of coding potential in a swath of sequence downstream from the site to that of a swath upstream of it, evaluated under a coding hypothesis (content sensing). Intuitively, we expect that the coding potential of the sequence downstream from a true site &#8211; which is, by definition, coding &#8211; would exceed that upstream of it &#8211; which is, by definition, non-coding &#8211; and that the ratio of the former to the latter should be greater in true sites as opposed to false. Coding probabilities of sequence swaths (96 nucleotides in length) are computed using a fifth-order <it>&#967;</it><sup>2</sup>-interpolated Markov chain model <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr></abbrgrp> as implemented in the IMMpractical library <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. The idea of integrating both content- and signal-based features into TIS prediction has been explored before <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B33">33</abbr></abbrgrp>, although the methodologies used here are distinct from previous studies.</p>
               <p>For the multiplicative-based flank-contrasting with weighted methionine log-likelihood ratios (MFCWLLKR) method, the signal- and content-based scores, expressed in log space, are added. The system optimizes over these scores at viable, in-frame start-methionine sites, and if the best-scoring site's score is non-negative, it is returned by the routine as its TIS prediction.</p>
            </sec>
            <sec>
               <st>
                  <p>Perceptron-based flank-contrasting with weighted methionine log-likelihood ratios</p>
               </st>
               <p>The perceptron-based flank-contrasting with weighted methionine log-likelihood ratios (PFCWLLKR) routine considers the same descriptive features as MFCWLLKR, but uses a perceptron as a multivariate utility function, as opposed to the multiplication operator. Perceptrons implement linear discriminants, and as such require linearly (or near-linearly) separable data sets to provide good classification performance (see, e.g., &#167;4.1.7 of <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>). Intuitively, we expect that the two dimensions corresponding to the signal- and content-based features exhibit linear (or near-linear) separability: both weighted log-likelihood ratios of methionine sites and log-likelihood ratios of the coding potentials of downstream to upstream content swaths should be greater-valued in true start methionines as opposed to false, non-start ones. Linear and sigmoid units are used to implement perceptrons in the MetWAMer system; each of these neural elements can learn a continuous-valued function that can be thresholded to enable discrete, binary classification; excellent discussions of these methods can be found in &#167;4.4.3 of <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> and &#167;20.5 of <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. Thus, linear and sigmoid units can be used to optimize over viable candidate start-methionine codons.</p>
               <p>PFCWLLKR returns the best such potential TIS if and only if it is classified as being a true site by the perceptron. Although Stormo <it>et al</it>. used a perceptron to classify translation initiation sites in bacteria in a pioneering study <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, they considered an entirely distinct feature set.</p>
            </sec>
            <sec>
               <st>
                  <p>Bayesian TIS prediction</p>
               </st>
               <p>Lastly, we also considered a Bayesian approach (BAYES) to predicting TIS sites. Each viable start-methionine in the maximal reading frame is considered under two separate models, one that the ATG is a true translation start codon and the other that it is not. The maximum <it>a posteriori </it>(MAP) hypothesis among this set of possibilities is computed, and if the site it denotes is represented as being a true TIS, BAYES returns this result as its TIS prediction. Otherwise, the method refrains from making any predictions. Calculation of the MAP hypothesis is formulated as follows. A prior distribution is derived for each maximal reading frame being surveyed: each in-frame ATG, under the model of its being a true initiation site, is given a prior probability proportional to the relative length of the peptide it induces compared with that of the maximal reading frame. Similarly, under the model of not being a TIS, each such site is assigned a prior probability proportional to the complement of its prior probability of being a true one. These values are normalized so as to collectively represent a valid probability mass function over all putative start-methionine sites, under both models. The likelihood of data is modeled using log-likelihood scores computed with the methionine-WAM.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Data sets</p>
            </st>
            <p>Only gene annotations marked as curated in the current <it>Arabidopsis thaliana </it>annotation made available by TAIR (version 7, <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>) were used for developing methionine-weight array matrices. In TAIR, a curated status implies that these structures have been either manually inspected or are supported by full-length cDNA evidence. Training instances were further required to encode protein products at least 100 amino acid residues long, whose initial codon was ATG. For annotations satisfying these criteria, coding sequences were extracted from genomic templates using supplied reference coordinates. Because the TAIR annotation contains deliberate indel mutations in certain coding sequences with respect to genomic templates (see, e.g., gene models At1g03530.1 <url>http://www.plantgdb.org/AtGDB-cgi/getRegion.pl?dbid=2&amp;chr=1&amp;l_pos=879997&amp;r_pos=883891</url> and At5g21105.1 <url>http://www.plantgdb.org/AtGDB-cgi/getRegion.pl?dbid=2&amp;chr=5&amp;l_pos=7172277&amp;r_pos=7178249</url>), and these modifications are not reflected in genomic reference coordinates, only parsed coding sequences having lengths divisible by three were retained for analysis. This overall process is implemented in the parse tigr codseqs utility from the MetWAMer package, which processes documents provided in the TIGR XML format <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>.</p>
            <p>These data were then post-processed to purge transposable elements and curtail redundancy. All coding sequences with significant matches (E-value &lt; 10<sup>-15</sup>) to a sequence present in the TIGR plant repetitive element database <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>, calculated using BLASTN <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>, were eliminated. To limit redundancy in the remaining data, the BLASTClust utility <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> was used: sequence pairs having &#8805; 80% nucleotide identity covering &#8805; 80% of the longest sequence's length were clustered. Any sequence that clustered with one or more others was eliminated from the data set, i.e., we retained only one gene from each cluster. This resulted in 19,703 TIS-containing genes being retained for analysis.</p>
            <p>A non-TIS-containing data set was compiled also, for testing the methods' abilities to not predict a TIS when none is present. The TIS-containing gene set was used as a starting point, from which we excluded single-exon genes. Of the remaining structures, the first coding exons (known to contain true TISs) were ablated from the conceptually spliced mRNAs; either 0, 1, or 2 bases were clipped from the 5'-terminus of these second exons in order to preserve the original reading frame. Next, sufficient flanking genomic sequences upstream of these exons were prepended, to facilitate the flank-contrasting methods &#8211; we remain neutral as to whether these are contributed entirely from the first ORF-disrupting intron of the gene, or if they might also include fragments from one or more upstream exons, 5'-UTR introns, or intergenic sequences. In total, 16,121 non-TIS-containing instances were retained for analysis.</p>
         </sec>
         <sec>
            <st>
               <p>Stratified training and testing</p>
            </st>
            <p>In addition to homogeneous training, which does not address the possibility of characteristic features of potentially distinct biological classes of translation initiation sites, the calc_medoids utility of MetWAMer implements a method for developing stratified training data sets, which can be used to parameterize MetWAMer for cluster-specific TIS prediction behavior. The <it>k</it>-medoids algorithm, as implemented in the C Clustering Library <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>, is used to calculate medoids (instances in each of the <it>k </it>clusters for which the distance to all other elements of the cluster are minimized), using a non-redundant set of translation initiation site sequences (five bases upstream of the ATG codon through three bases downstream). The Hamming distance is used to measure pairwise similarity of such instances.</p>
            <p>MetWAMer implements a total of six possible methods for utilizing cluster-specific information during the prediction phase, when the true class of the sequence's TIS is unknown beforehand: three distinct measures of a site's "closeness" to those in a given cluster are defined, and each measure can be used either by selecting the best parameter set for every site encountered during scanning (modulating) or by choosing the best set on the basis of the first in-frame ATG encountered, and committing to the exclusive use of it for scoring any remaining putative TISs in the reading frame (static). Thus, these combinations comprise a collection of parameter set indexing strategies, which allow for lookup of those partition-specific parameters most appropriate for scoring a site.</p>
            <p>The first measure considered is the Hamming distance, which lends itself to an indexing strategy in which, for a putative TIS, its distance is computed relative to the <it>k </it>medoids identified in the clustering step; cluster-specific parameters corresponding to the medoid whose Hamming distance is minimal to it, are used to score. The PWM-based indexing method utilizes cluster-specific position weight matrices for measuring the site's similarity to known clusters, and the parameter set whose representative PWM renders the putative TIS most likely is used for scoring. Specifically, a PWM characterizes position-specific mononucleotide distributions at genetic elements such as promoter sites, splice sites, or translation initiation sites <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>. Here, the likelihood of a potential TIS as having been generated by the (trained) PWM is given by</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-9-381-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>L</m:mi>
                           <m:msub>
                              <m:mi>K</m:mi>
                              <m:mrow>
                                 <m:mi>p</m:mi>
                                 <m:mi>w</m:mi>
                                 <m:mi>m</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munder>
                                 <m:mo>&#8719;</m:mo>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mo>&#8722;</m:mo>
                                    <m:mn>5...</m:mn>
                                    <m:mo>+</m:mo>
                                    <m:mn>5</m:mn>
                                 </m:mrow>
                              </m:munder>
                              <m:mrow>
                                 <m:msub>
                                    <m:mi>f</m:mi>
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>B</m:mi>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                    </m:mrow>
                                 </m:msub>
                              </m:mrow>
                           </m:mstyle>
                           <m:mo>,</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemitaWKaem4saS0aaSbaaSqaaiabdchaWjabdEha3jabd2gaTbqabaGccqGH9aqpdaqeqbqaaiabdAgaMnaaBaaaleaacqWGcbGqdaWgaaadbaGaemyAaKgabeaaaSqabaaabaGaemyAaKMaeyypa0JaeyOeI0IaeGynauJaeiOla4IaeiOla4IaeiOla4Iaey4kaSIaeGynaudabeqdcqGHpis1aOGaeiilaWcaaa@43C0@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <it>i </it>indexes each position in the site (the adenine residue of the ATG codon is assigned position 0), and <inline-formula><m:math name="1471-2105-9-381-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>f</m:mi><m:mrow><m:msub><m:mi>B</m:mi><m:mi>i</m:mi></m:msub></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemOzay2aaSbaaSqaaiabdkeacnaaBaaameaacqWGPbqAaeqaaaWcbeaaaaa@2FF6@</m:annotation></m:semantics></m:math></inline-formula> is the relative frequency of base <it>B</it><sub><it>i </it></sub>&#8712; {<it>A</it>, <it>C</it>, <it>G</it>, <it>T</it>} in position <it>i </it>of the aligned training sequences. Finally, a WAM-based indexing method is implemented, which is analogous to the PWM-based strategy, though (first-order) weight array matrices are used for computing likelihoods, rather than PWMs. For a potential TIS site, the likelihood of its having been generated by the WAM is computed by MetWAMer as</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-9-381-i3" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>L</m:mi>
                           <m:msub>
                              <m:mi>K</m:mi>
                              <m:mrow>
                                 <m:mi>w</m:mi>
                                 <m:mi>a</m:mi>
                                 <m:mi>m</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munder>
                                 <m:mo>&#8719;</m:mo>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mo>&#8722;</m:mo>
                                    <m:mn>5...</m:mn>
                                    <m:mo>+</m:mo>
                                    <m:mn>4</m:mn>
                                 </m:mrow>
                              </m:munder>
                              <m:mrow>
                                 <m:msub>
                                    <m:mi>f</m:mi>
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>D</m:mi>
                                          <m:mrow>
                                             <m:mi>i</m:mi>
                                             <m:mo>,</m:mo>
                                             <m:mi>i</m:mi>
                                             <m:mo>+</m:mo>
                                             <m:mn>1</m:mn>
                                          </m:mrow>
                                       </m:msub>
                                    </m:mrow>
                                 </m:msub>
                              </m:mrow>
                           </m:mstyle>
                           <m:mo>,</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemitaWKaem4saS0aaSbaaSqaaiabdEha3jabdggaHjabd2gaTbqabaGccqGH9aqpdaqeqbqaaiabdAgaMnaaBaaaleaacqWGebardaWgaaadbaGaemyAaKMaeiilaWIaemyAaKMaey4kaSIaeGymaedabeaaaSqabaaabaGaemyAaKMaeyypa0JaeyOeI0IaeGynauJaeiOla4IaeiOla4IaeiOla4Iaey4kaSIaeGinaqdabeqdcqGHpis1aOGaeiilaWcaaa@47B1@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <it>i </it>indexes each position in the site, and <inline-formula><m:math name="1471-2105-9-381-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>f</m:mi><m:mrow><m:msub><m:mi>D</m:mi><m:mrow><m:mi>i</m:mi><m:mo>,</m:mo><m:mi>i</m:mi><m:mo>+</m:mo><m:mn>1</m:mn></m:mrow></m:msub></m:mrow></m:msub><m:mo>&#8712;</m:mo><m:mo>{</m:mo><m:mi>A</m:mi><m:mi>A</m:mi><m:mo>,</m:mo><m:mi>A</m:mi><m:mi>C</m:mi><m:mo>,</m:mo><m:mi>A</m:mi><m:mi>G</m:mi><m:mo>,</m:mo><m:mn>...</m:mn><m:mo>,</m:mo><m:mi>T</m:mi><m:mi>T</m:mi><m:mo>}</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemOzay2aaSbaaSqaaiabdseaenaaBaaameaacqWGPbqAcqGGSaalcqWGPbqAcqGHRaWkcqaIXaqmaeqaaaWcbeaakiabgIGiolabcUha7jabdgeabjabdgeabjabcYcaSiabdgeabjabdoeadjabcYcaSiabdgeabjabdEeahjabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiabdsfaujabdsfaujabc2ha9baa@4775@</m:annotation></m:semantics></m:math></inline-formula> is the relative frequency of the observed dinucleotide <it>D</it><sub><it>i</it>,<it>i</it>+1 </sub>occurring at position <it>i </it>in aligned training data.</p>
         </sec>
         <sec>
            <st>
               <p>Test design</p>
            </st>
            <p>A five-fold cross-validation strategy was used to assess the methods on the task of translation initiation site detection competency. Because the TIS-containing instances consist only of known coding sequences from gene structures, and in practice MetWAMer scans for potential TISs across a maximal ORF, we extended the coding sequences at their 5'-termini to achieve a maximal, non-stop reading frame, thereby presenting the system with the challenge of disambiguating spurious (in-frame) methionine codons in the extended reading frame from true start codons &#8211; we make no considerations as to whether these extended sequences are derived from 5'-UTRs, introns in 5'-UTRs, or intergenic sequences. Methionine-WAMs and Markov chains were trained on each cross-validation replicate and then used to train a sigmoidal perceptron, using a learning rate of 1 &#215; 10<sup>-5</sup>. Sigmoid units outperformed linear units in all experiments we conducted (data not show), so we do not consider the latter further. As a baseline for comparison of the implemented models, we consider also the 1<sup><it>st</it></sup>-ATG method, which predicts the first in-frame ATG it encounters in the maximized reading frame as a TIS. Tests using non-TIS-containing instances were conducted similarly, though reading frames were not maximally extended at their 5'-terminus. The testing procedure is shown pictorially in Figure <figr fid="F2">2</figr>.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>TIS detection competency tests</p>
               </caption>
               <text>
                  <p><b>TIS detection competency tests</b>. Shown are two distinct testing scenarios for TIS identification competency in maximal, TIS-containing reading frames and in reading frames lacking a true TIS. In TIS-containing tests, three outcomes are possible: the system predicts the true TIS as the TIS for the gene (TP), it predicts a false TIS as the gene's TIS (FP), or it fails to predict any TIS for the gene (FN). In the non-TIS-containing scenario, the system either (correctly) refuses to predict a TIS for the gene (TN) or mislabels some in-frame ATG as a TIS (FP).</p>
               </text>
               <graphic file="1471-2105-9-381-2"/>
            </fig>
            <p>To assess the performance of MetWAMer relative to prior art in translation initiation prediction, we compared our system with the NetStart <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, TIS Miner <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>, TISHunter <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> and ATGpr <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> programs.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Computational TIS identification in TIS-containing ORFs</p>
            </st>
            <p>Table <tblr tid="T1">1</tblr> summarizes TIS prediction accuracy in TIS-containing ORFs. Given the knowledge that the transcript under consideration is 5'-complete, the simple strategy of predicting the leftmost ATG to be the TIS ("1<sup><it>st</it></sup>-ATG") is seen to give the best performance by far. The 94% sensitivity and specificity merely reflects the proportion of transcripts not subject to leaky scanning. All other methods incorporate uncertainty about 5'-completeness and specifically allow for the possibility of observing a non-TIS-containing transcript fragment (prediction of which for this test set would always result in a false negative instance). Restricting attention to the method-specific results obtained under homogeneous parameter usage, it can be seen that MFCWLLKR has better sensitivity than the remaining methods, with PFCWLLKR exhibiting comparable levels. PFCWLLKR dominates the remaining models in terms of specificity. WLLKR is the third-most successful method at identifying true TISs, though it suffers from a relatively high rate of false negative predictions. The BAYES routine makes fewer true positive predictions than WLLKR, and more false positive and false negative identifications.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Method performances on TIS-containing data.</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c ca="left">
                        <p>Parametrization</p>
                     </c>
                     <c ca="left">
                        <p>Method</p>
                     </c>
                     <c ca="right">
                        <p>TP</p>
                     </c>
                     <c ca="right">
                        <p>FP</p>
                     </c>
                     <c ca="right">
                        <p>FN</p>
                     </c>
                     <c ca="right">
                        <p>Sn</p>
                     </c>
                     <c ca="right">
                        <p>Sp</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1<sup><it>st</it></sup>-ATG</p>
                     </c>
                     <c ca="right">
                        <p>18,553</p>
                     </c>
                     <c ca="right">
                        <p>1,150</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>0.9416</p>
                     </c>
                     <c ca="right">
                        <p>0.9416</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>TISHunter</p>
                     </c>
                     <c ca="right">
                        <p>17,789</p>
                     </c>
                     <c ca="right">
                        <p>1,914</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>0.9029</p>
                     </c>
                     <c ca="right">
                        <p>0.9029</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>ATGpr</p>
                     </c>
                     <c ca="right">
                        <p>17,160</p>
                     </c>
                     <c ca="right">
                        <p>2,543</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>0.8709</p>
                     </c>
                     <c ca="right">
                        <p>0.8709</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>TIS Miner</p>
                     </c>
                     <c ca="right">
                        <p>15,521</p>
                     </c>
                     <c ca="right">
                        <p>3,650</p>
                     </c>
                     <c ca="right">
                        <p>532</p>
                     </c>
                     <c ca="right">
                        <p>0.7877</p>
                     </c>
                     <c ca="right">
                        <p>0.8096</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>NetStart</p>
                     </c>
                     <c ca="right">
                        <p>5,123</p>
                     </c>
                     <c ca="right">
                        <p>14,527</p>
                     </c>
                     <c ca="right">
                        <p>53</p>
                     </c>
                     <c ca="right">
                        <p>0.2600</p>
                     </c>
                     <c ca="right">
                        <p>0.2607</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>homogeneous</p>
                     </c>
                     <c ca="left">
                        <p>LLKR</p>
                     </c>
                     <c ca="right">
                        <p>9,268</p>
                     </c>
                     <c ca="right">
                        <p>9,318</p>
                     </c>
                     <c ca="right">
                        <p>1,117</p>
                     </c>
                     <c ca="right">
                        <p>0.4704</p>
                     </c>
                     <c ca="right">
                        <p>0.4987</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>WLLKR</p>
                     </c>
                     <c ca="right">
                        <p>12,511</p>
                     </c>
                     <c ca="right">
                        <p>4,486</p>
                     </c>
                     <c ca="right">
                        <p>2,706</p>
                     </c>
                     <c ca="right">
                        <p>0.6350</p>
                     </c>
                     <c ca="right">
                        <p>0.7361</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>MFCWLLKR</p>
                     </c>
                     <c ca="right">
                        <p>15,167</p>
                     </c>
                     <c ca="right">
                        <p>4,535</p>
                     </c>
                     <c ca="right">
                        <p>1</p>
                     </c>
                     <c ca="right">
                        <p>0.7698</p>
                     </c>
                     <c ca="right">
                        <p>0.7698</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>PFCWLLKR</p>
                     </c>
                     <c ca="right">
                        <p>14,692</p>
                     </c>
                     <c ca="right">
                        <p>4,191</p>
                     </c>
                     <c ca="right">
                        <p>820</p>
                     </c>
                     <c ca="right">
                        <p>0.7457</p>
                     </c>
                     <c ca="right">
                        <p>0.7781</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>BAYES</p>
                     </c>
                     <c ca="right">
                        <p>10,121</p>
                     </c>
                     <c ca="right">
                        <p>6,482</p>
                     </c>
                     <c ca="right">
                        <p>3,100</p>
                     </c>
                     <c ca="right">
                        <p>0.5137</p>
                     </c>
                     <c ca="right">
                        <p>0.6096</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>cluster-specific</p>
                     </c>
                     <c ca="left">
                        <p>LLKR</p>
                     </c>
                     <c ca="right">
                        <p>11,964</p>
                     </c>
                     <c ca="right">
                        <p>6,946</p>
                     </c>
                     <c ca="right">
                        <p>793</p>
                     </c>
                     <c ca="right">
                        <p>0.6072</p>
                     </c>
                     <c ca="right">
                        <p>0.6327</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>WLLKR</p>
                     </c>
                     <c ca="right">
                        <p>14,931</p>
                     </c>
                     <c ca="right">
                        <p>3,085</p>
                     </c>
                     <c ca="right">
                        <p>1,687</p>
                     </c>
                     <c ca="right">
                        <p>0.7578</p>
                     </c>
                     <c ca="right">
                        <p>0.8288</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>MFCWLLKR</p>
                     </c>
                     <c ca="right">
                        <p>16,576</p>
                     </c>
                     <c ca="right">
                        <p>3,127</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>0.8413</p>
                     </c>
                     <c ca="right">
                        <p>0.8413</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>PFCWLLKR</p>
                     </c>
                     <c ca="right">
                        <p>16,209</p>
                     </c>
                     <c ca="right">
                        <p>2,834</p>
                     </c>
                     <c ca="right">
                        <p>660</p>
                     </c>
                     <c ca="right">
                        <p>0.8227</p>
                     </c>
                     <c ca="right">
                        <p>0.8512</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>BAYES</p>
                     </c>
                     <c ca="right">
                        <p>12,399</p>
                     </c>
                     <c ca="right">
                        <p>4,988</p>
                     </c>
                     <c ca="right">
                        <p>2,316</p>
                     </c>
                     <c ca="right">
                        <p>0.6293</p>
                     </c>
                     <c ca="right">
                        <p>0.7131</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>random split</p>
                     </c>
                     <c ca="left">
                        <p>LLKR</p>
                     </c>
                     <c ca="right">
                        <p>9,191</p>
                     </c>
                     <c ca="right">
                        <p>9,402</p>
                     </c>
                     <c ca="right">
                        <p>1,110</p>
                     </c>
                     <c ca="right">
                        <p>0.4665</p>
                     </c>
                     <c ca="right">
                        <p>0.4943</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>WLLKR</p>
                     </c>
                     <c ca="right">
                        <p>12,491</p>
                     </c>
                     <c ca="right">
                        <p>4,507</p>
                     </c>
                     <c ca="right">
                        <p>2,705</p>
                     </c>
                     <c ca="right">
                        <p>0.6340</p>
                     </c>
                     <c ca="right">
                        <p>0.7349</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>MFCWLLKR</p>
                     </c>
                     <c ca="right">
                        <p>15,183</p>
                     </c>
                     <c ca="right">
                        <p>4,519</p>
                     </c>
                     <c ca="right">
                        <p>1</p>
                     </c>
                     <c ca="right">
                        <p>0.7706</p>
                     </c>
                     <c ca="right">
                        <p>0.7706</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>PFCWLLKR</p>
                     </c>
                     <c ca="right">
                        <p>14,648</p>
                     </c>
                     <c ca="right">
                        <p>4,198</p>
                     </c>
                     <c ca="right">
                        <p>857</p>
                     </c>
                     <c ca="right">
                        <p>0.7434</p>
                     </c>
                     <c ca="right">
                        <p>0.7772</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>BAYES</p>
                     </c>
                     <c ca="right">
                        <p>10,084</p>
                     </c>
                     <c ca="right">
                        <p>6,509</p>
                     </c>
                     <c ca="right">
                        <p>3,110</p>
                     </c>
                     <c ca="right">
                        <p>0.5118</p>
                     </c>
                     <c ca="right">
                        <p>0.6077</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>19,703 TIS-containing instances were used in three separate five-fold cross-validation experiments. Results are shown from applying a non-stratified parameter set (homogeneous), <it>a priori</it>-known cluster-specific parameter sets for <it>k </it>= 3 (cluster-specific), and group-specific parameter sets for a random three-way split of the data (random split). <it>TP </it>represents the number of instances for which the method correctly identified a TIS; <it>FP </it>for which a prediction was made, though incorrect; and <it>FN </it>for which no prediction was made, but should have been (see Figure 2). <inline-formula><m:math name="1471-2105-9-381-i5" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>S</m:mi><m:mi>n</m:mi><m:mo>=</m:mo><m:mfrac><m:mrow><m:mi>T</m:mi><m:mi>P</m:mi></m:mrow><m:mrow><m:mi>T</m:mi><m:mi>P</m:mi><m:mo>+</m:mo><m:mi>F</m:mi><m:mi>P</m:mi><m:mo>+</m:mo><m:mi>F</m:mi><m:mi>N</m:mi></m:mrow></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaem4uamLaemOBa4Maeyypa0tcfa4aaSaaaeaacqWGubavcqWGqbauaeaacqWGubavcqWGqbaucqGHRaWkcqWGgbGrcqWGqbaucqGHRaWkcqWGgbGrcqWGobGtaaaaaa@3AFD@</m:annotation></m:semantics></m:math></inline-formula>, and <inline-formula><m:math name="1471-2105-9-381-i6" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>S</m:mi><m:mi>p</m:mi><m:mo>=</m:mo><m:mfrac><m:mrow><m:mi>T</m:mi><m:mi>P</m:mi></m:mrow><m:mrow><m:mi>T</m:mi><m:mi>P</m:mi><m:mo>+</m:mo><m:mi>F</m:mi><m:mi>P</m:mi></m:mrow></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaem4uamLaemiCaaNaeyypa0tcfa4aaSaaaeaacqWGubavcqWGqbauaeaacqWGubavcqWGqbaucqGHRaWkcqWGgbGrcqWGqbauaaaaaa@37E5@</m:annotation></m:semantics></m:math></inline-formula>.</p>
               </tblfn>
            </tbl>
            <p>Cluster-specific parameter results were produced by first stratifying the data with respect to the clusters identified by <it>k</it>-medoids, for <it>k </it>= 3, conducting five-fold cross-validation analyses independently for each cluster, and averaging the results. Thus, we explicitly leveraged information concerning the true cluster to which a test sequence's TIS belongs. All methods increased markedly in TIS prediction performance. To demonstrate that this observation is not simply an artifact due to potentially over-fitting the models to smaller training set sizes, we randomly split the data into three separate partitions and repeated the analysis. The random split results are essentially indistinguishable from those obtained using homogeneous deployment, and thus we may conclude that the performance gains from cluster-specific parameter training reflect non-random effects.</p>
         </sec>
         <sec>
            <st>
               <p>Computational TIS identification in transcripts undergoing leaky scanning</p>
            </st>
            <p>1,150 instances from the TIS-containing data set are known to contain in-frame ATGs upstream from the true TIS. Table <tblr tid="T2">2</tblr> provides TIS prediction statistics derived exclusively from these cases. By definition, 1<sup><it>st</it></sup>-ATG is a complete failure in this scenario. PFCWLLKR has greater sensitivity than all other methods under all parameter deployment strategies with the exception of cluster-specific, in which MFCWLLKR bests it by roughly 0.35%. WLLKR strictly dominates all methods in terms of specificity, outperforming the second-best method, PFCWLLKR, by under four percent for any parametrization strategy.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Distinguishing in-frame, upstream ATG sites from true TISs</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c ca="left">
                        <p>Parametrization</p>
                     </c>
                     <c ca="left">
                        <p>Method</p>
                     </c>
                     <c ca="right">
                        <p>TP</p>
                     </c>
                     <c ca="right">
                        <p>FP</p>
                     </c>
                     <c ca="right">
                        <p>FN</p>
                     </c>
                     <c ca="right">
                        <p>Sn</p>
                     </c>
                     <c ca="right">
                        <p>Sp</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1<sup><it>st</it></sup>-ATG</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>1,150</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>0.0000</p>
                     </c>
                     <c ca="right">
                        <p>0.0000</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>TISHunter</p>
                     </c>
                     <c ca="right">
                        <p>69</p>
                     </c>
                     <c ca="right">
                        <p>1,081</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>0.0600</p>
                     </c>
                     <c ca="right">
                        <p>0.0600</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>ATGpr</p>
                     </c>
                     <c ca="right">
                        <p>97</p>
                     </c>
                     <c ca="right">
                        <p>1,053</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>0.0843</p>
                     </c>
                     <c ca="right">
                        <p>0.0843</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>TIS Miner</p>
                     </c>
                     <c ca="right">
                        <p>216</p>
                     </c>
                     <c ca="right">
                        <p>832</p>
                     </c>
                     <c ca="right">
                        <p>102</p>
                     </c>
                     <c ca="right">
                        <p>0.1878</p>
                     </c>
                     <c ca="right">
                        <p>0.2061</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>NetStart</p>
                     </c>
                     <c ca="right">
                        <p>216</p>
                     </c>
                     <c ca="right">
                        <p>930</p>
                     </c>
                     <c ca="right">
                        <p>4</p>
                     </c>
                     <c ca="right">
                        <p>0.1878</p>
                     </c>
                     <c ca="right">
                        <p>0.1885</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>homogeneous</p>
                     </c>
                     <c ca="left">
                        <p>LLKR</p>
                     </c>
                     <c ca="right">
                        <p>437</p>
                     </c>
                     <c ca="right">
                        <p>671</p>
                     </c>
                     <c ca="right">
                        <p>42</p>
                     </c>
                     <c ca="right">
                        <p>0.3800</p>
                     </c>
                     <c ca="right">
                        <p>0.3944</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>WLLKR</p>
                     </c>
                     <c ca="right">
                        <p>531</p>
                     </c>
                     <c ca="right">
                        <p>467</p>
                     </c>
                     <c ca="right">
                        <p>152</p>
                     </c>
                     <c ca="right">
                        <p>0.4617</p>
                     </c>
                     <c ca="right">
                        <p>0.5321</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>MFCWLLKR</p>
                     </c>
                     <c ca="right">
                        <p>525</p>
                     </c>
                     <c ca="right">
                        <p>625</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>0.4565</p>
                     </c>
                     <c ca="right">
                        <p>0.4565</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>PFCWLLKR</p>
                     </c>
                     <c ca="right">
                        <p>542</p>
                     </c>
                     <c ca="right">
                        <p>551</p>
                     </c>
                     <c ca="right">
                        <p>57</p>
                     </c>
                     <c ca="right">
                        <p>0.4713</p>
                     </c>
                     <c ca="right">
                        <p>0.4959</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>BAYES</p>
                     </c>
                     <c ca="right">
                        <p>442</p>
                     </c>
                     <c ca="right">
                        <p>511</p>
                     </c>
                     <c ca="right">
                        <p>197</p>
                     </c>
                     <c ca="right">
                        <p>0.3843</p>
                     </c>
                     <c ca="right">
                        <p>0.4638</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>cluster-specific</p>
                     </c>
                     <c ca="left">
                        <p>LLKR</p>
                     </c>
                     <c ca="right">
                        <p>552</p>
                     </c>
                     <c ca="right">
                        <p>550</p>
                     </c>
                     <c ca="right">
                        <p>48</p>
                     </c>
                     <c ca="right">
                        <p>0.4800</p>
                     </c>
                     <c ca="right">
                        <p>0.5009</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>WLLKR</p>
                     </c>
                     <c ca="right">
                        <p>663</p>
                     </c>
                     <c ca="right">
                        <p>380</p>
                     </c>
                     <c ca="right">
                        <p>107</p>
                     </c>
                     <c ca="right">
                        <p>0.5765</p>
                     </c>
                     <c ca="right">
                        <p>0.6357</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>MFCWLLKR</p>
                     </c>
                     <c ca="right">
                        <p>683</p>
                     </c>
                     <c ca="right">
                        <p>467</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>0.5939</p>
                     </c>
                     <c ca="right">
                        <p>0.5939</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>PFCWLLKR</p>
                     </c>
                     <c ca="right">
                        <p>679</p>
                     </c>
                     <c ca="right">
                        <p>419</p>
                     </c>
                     <c ca="right">
                        <p>52</p>
                     </c>
                     <c ca="right">
                        <p>0.5904</p>
                     </c>
                     <c ca="right">
                        <p>0.6184</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>BAYES</p>
                     </c>
                     <c ca="right">
                        <p>567</p>
                     </c>
                     <c ca="right">
                        <p>406</p>
                     </c>
                     <c ca="right">
                        <p>177</p>
                     </c>
                     <c ca="right">
                        <p>0.4930</p>
                     </c>
                     <c ca="right">
                        <p>0.5827</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>random split</p>
                     </c>
                     <c ca="left">
                        <p>LLKR</p>
                     </c>
                     <c ca="right">
                        <p>434</p>
                     </c>
                     <c ca="right">
                        <p>672</p>
                     </c>
                     <c ca="right">
                        <p>44</p>
                     </c>
                     <c ca="right">
                        <p>0.3774</p>
                     </c>
                     <c ca="right">
                        <p>0.3924</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>WLLKR</p>
                     </c>
                     <c ca="right">
                        <p>522</p>
                     </c>
                     <c ca="right">
                        <p>468</p>
                     </c>
                     <c ca="right">
                        <p>160</p>
                     </c>
                     <c ca="right">
                        <p>0.4539</p>
                     </c>
                     <c ca="right">
                        <p>0.5273</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>MFCWLLKR</p>
                     </c>
                     <c ca="right">
                        <p>530</p>
                     </c>
                     <c ca="right">
                        <p>620</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>0.4609</p>
                     </c>
                     <c ca="right">
                        <p>0.4609</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>PFCWLLKR</p>
                     </c>
                     <c ca="right">
                        <p>541</p>
                     </c>
                     <c ca="right">
                        <p>551</p>
                     </c>
                     <c ca="right">
                        <p>58</p>
                     </c>
                     <c ca="right">
                        <p>0.4704</p>
                     </c>
                     <c ca="right">
                        <p>0.4954</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>BAYES</p>
                     </c>
                     <c ca="right">
                        <p>439</p>
                     </c>
                     <c ca="right">
                        <p>512</p>
                     </c>
                     <c ca="right">
                        <p>199</p>
                     </c>
                     <c ca="right">
                        <p>0.3817</p>
                     </c>
                     <c ca="right">
                        <p>0.4616</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>TIS identification statistics are reported exclusively for the 1,150 maximal reading frames containing in-frame ATG sites upstream from the true TIS, under three distinct parameter utilization approaches: homogeneous, <it>a priori</it>-known cluster-specific with <it>k </it>= 3, and three-fold random split. <it>TP </it>represents the number of instances for which the method correctly identified a TIS; <it>FP </it>for which a prediction was made, though incorrect; and <it>FN </it>for which no prediction was made, but should have been (see Figure 2). <inline-formula><m:math name="1471-2105-9-381-i5" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>S</m:mi><m:mi>n</m:mi><m:mo>=</m:mo><m:mfrac><m:mrow><m:mi>T</m:mi><m:mi>P</m:mi></m:mrow><m:mrow><m:mi>T</m:mi><m:mi>P</m:mi><m:mo>+</m:mo><m:mi>F</m:mi><m:mi>P</m:mi><m:mo>+</m:mo><m:mi>F</m:mi><m:mi>N</m:mi></m:mrow></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaem4uamLaemOBa4Maeyypa0tcfa4aaSaaaeaacqWGubavcqWGqbauaeaacqWGubavcqWGqbaucqGHRaWkcqWGgbGrcqWGqbaucqGHRaWkcqWGgbGrcqWGobGtaaaaaa@3AFD@</m:annotation></m:semantics></m:math></inline-formula>, and <inline-formula><m:math name="1471-2105-9-381-i6" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>S</m:mi><m:mi>p</m:mi><m:mo>=</m:mo><m:mfrac><m:mrow><m:mi>T</m:mi><m:mi>P</m:mi></m:mrow><m:mrow><m:mi>T</m:mi><m:mi>P</m:mi><m:mo>+</m:mo><m:mi>F</m:mi><m:mi>P</m:mi></m:mrow></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaem4uamLaemiCaaNaeyypa0tcfa4aaSaaaeaacqWGubavcqWGqbauaeaacqWGubavcqWGqbaucqGHRaWkcqWGgbGrcqWGqbauaaaaaa@37E5@</m:annotation></m:semantics></m:math></inline-formula>.</p>
               </tblfn>
            </tbl>
            <p>PFCWLLKR should be more prone to false positive prediction on these sequences because the upstream ATGs would typically have better coding potential contrast than the true TIS.</p>
         </sec>
         <sec>
            <st>
               <p>Method performace on non-TIS-containing transcript fragments</p>
            </st>
            <p>Table <tblr tid="T3">3</tblr> provides TIS prediction performance statistics in non-TIS-containing instances. 1<sup><it>st</it></sup>-ATG performs worse than all other methods, under every deployment approach. Under homogeneous parameter deployment, WLLKR dominates the remaining methods, with BAYES being second-best, and PFCWLLKR third-best, with sensitivities varying in a range of less than three percent. Again, it is observed that cluster-specific parameter usage leads to considerable performance gains, whereas random splits produce results essentially indistinguishable from the homogeneous-based results.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Method performances on non-TIS-containing data</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>Parametrization</p>
                     </c>
                     <c ca="left">
                        <p>Method</p>
                     </c>
                     <c ca="right">
                        <p>TN</p>
                     </c>
                     <c ca="right">
                        <p>FP</p>
                     </c>
                     <c ca="right">
                        <p>Sn</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1<sup><it>st</it></sup>-ATG</p>
                     </c>
                     <c ca="right">
                        <p>688</p>
                     </c>
                     <c ca="right">
                        <p>15,433</p>
                     </c>
                     <c ca="right">
                        <p>0.0427</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>TISHunter</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>16,121</p>
                     </c>
                     <c ca="right">
                        <p>0.0000</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>ATGpr</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>16,121</p>
                     </c>
                     <c ca="right">
                        <p>0.0000</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>TIS Miner</p>
                     </c>
                     <c ca="right">
                        <p>3,142</p>
                     </c>
                     <c ca="right">
                        <p>12,979</p>
                     </c>
                     <c ca="right">
                        <p>0.1949</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>NetStart</p>
                     </c>
                     <c ca="right">
                        <p>575</p>
                     </c>
                     <c ca="right">
                        <p>15,546</p>
                     </c>
                     <c ca="right">
                        <p>0.0357</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>homogeneous</p>
                     </c>
                     <c ca="left">
                        <p>LLKR</p>
                     </c>
                     <c ca="right">
                        <p>5,179</p>
                     </c>
                     <c ca="right">
                        <p>10,942</p>
                     </c>
                     <c ca="right">
                        <p>0.3213</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>WLLKR</p>
                     </c>
                     <c ca="right">
                        <p>7,260</p>
                     </c>
                     <c ca="right">
                        <p>8,861</p>
                     </c>
                     <c ca="right">
                        <p>0.4503</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>MFCWLLKR</p>
                     </c>
                     <c ca="right">
                        <p>1,688</p>
                     </c>
                     <c ca="right">
                        <p>14,433</p>
                     </c>
                     <c ca="right">
                        <p>0.1047</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>PFCWLLKR</p>
                     </c>
                     <c ca="right">
                        <p>6,785</p>
                     </c>
                     <c ca="right">
                        <p>9,336</p>
                     </c>
                     <c ca="right">
                        <p>0.4209</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>BAYES</p>
                     </c>
                     <c ca="right">
                        <p>6,813</p>
                     </c>
                     <c ca="right">
                        <p>9,308</p>
                     </c>
                     <c ca="right">
                        <p>0.4226</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>cluster-specific</p>
                     </c>
                     <c ca="left">
                        <p>LLKR</p>
                     </c>
                     <c ca="right">
                        <p>6,385</p>
                     </c>
                     <c ca="right">
                        <p>9,736</p>
                     </c>
                     <c ca="right">
                        <p>0.3961</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>WLLKR</p>
                     </c>
                     <c ca="right">
                        <p>8,080</p>
                     </c>
                     <c ca="right">
                        <p>8,041</p>
                     </c>
                     <c ca="right">
                        <p>0.5012</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>MFCWLLKR</p>
                     </c>
                     <c ca="right">
                        <p>1,995</p>
                     </c>
                     <c ca="right">
                        <p>14,126</p>
                     </c>
                     <c ca="right">
                        <p>0.1238</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>PFCWLLKR</p>
                     </c>
                     <c ca="right">
                        <p>8,685</p>
                     </c>
                     <c ca="right">
                        <p>7,436</p>
                     </c>
                     <c ca="right">
                        <p>0.5387</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>BAYES</p>
                     </c>
                     <c ca="right">
                        <p>8,057</p>
                     </c>
                     <c ca="right">
                        <p>8,064</p>
                     </c>
                     <c ca="right">
                        <p>0.4998</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>random split</p>
                     </c>
                     <c ca="left">
                        <p>LLKR</p>
                     </c>
                     <c ca="right">
                        <p>5,155</p>
                     </c>
                     <c ca="right">
                        <p>10,966</p>
                     </c>
                     <c ca="right">
                        <p>0.3198</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>WLLKR</p>
                     </c>
                     <c ca="right">
                        <p>7,176</p>
                     </c>
                     <c ca="right">
                        <p>8,945</p>
                     </c>
                     <c ca="right">
                        <p>0.4451</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>MFCWLLKR</p>
                     </c>
                     <c ca="right">
                        <p>1,687</p>
                     </c>
                     <c ca="right">
                        <p>14,434</p>
                     </c>
                     <c ca="right">
                        <p>0.1046</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>PFCWLLKR</p>
                     </c>
                     <c ca="right">
                        <p>6,748</p>
                     </c>
                     <c ca="right">
                        <p>9,373</p>
                     </c>
                     <c ca="right">
                        <p>0.4186</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>BAYES</p>
                     </c>
                     <c ca="right">
                        <p>6,824</p>
                     </c>
                     <c ca="right">
                        <p>9,297</p>
                     </c>
                     <c ca="right">
                        <p>0.4233</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>16,121 non-TIS-containing instances were used in three separate five-fold cross-validation experiments. Results are shown from applying a non-stratified parameter set (homogeneous), <it>a priori</it>-known cluster-specific parameter sets for <it>k </it>= 3 (cluster-specific), and group-specific parameter sets for a random three-way split of the data (random split). <it>TN </it>represents the number of instances for which the method (correctly) refused to predict a TIS, and <it>FP </it>denotes the number for which some prediction was made, though always incorrect (see Figure 2). <inline-formula><m:math name="1471-2105-9-381-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>S</m:mi><m:mi>n</m:mi><m:mo>=</m:mo><m:mfrac><m:mrow><m:mi>T</m:mi><m:mi>N</m:mi></m:mrow><m:mrow><m:mi>T</m:mi><m:mi>N</m:mi><m:mo>+</m:mo><m:mi>F</m:mi><m:mi>P</m:mi></m:mrow></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaem4uamLaemOBa4Maeyypa0tcfa4aaSaaaeaacqWGubavcqWGobGtaeaacqWGubavcqWGobGtcqGHRaWkcqWGgbGrcqWGqbauaaaaaa@37D9@</m:annotation></m:semantics></m:math></inline-formula>.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Comparison with other TIS prediction tools</p>
            </st>
            <p>Based on results shown in Tables <tblr tid="T1">1</tblr>, <tblr tid="T2">2</tblr>, <tblr tid="T3">3</tblr>, we identify PFCWLLKR as the superior method currently implemented in MetWAMer, and therefore used it as a benchmark for comparison with other TIS prediction tools. Specifically, we consider PFCWLLKR used under the homogeneous parameter deployment approach. We compare this method with the NetStart <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, TIS Miner <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>, TISHunter <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> and ATGpr <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> programs. Because NetStart is a TIS classifier, and not a TIS prediction system, we interpreted its results as follows. For all potential TISs scored by the program, we ranked each instance on the basis of its score. If the best-scoring instance was classified as a true TIS (marked "Yes"), it was selected as the program's single TIS prediction; else, we interpreted the result as the system's decision to make no TIS prediction at all. We used the web interface to the program available at <url>http://www.cbs.dtu.dk/services/NetStart/</url> and used its <it>Arabidopsis</it>-specific parameters. The TIS Miner program, available at <url>http://dnafsminer.bic.nus.edu.sg/Tis.html</url> was used with default paramters, with the number of predictions set to 1. We used a classification threshold of 0.5 for this program, such that if the TIS prediction it returned was at least 0.5, it was selected as the system's prediction, while if not, this was interpreted as its decision not to return a TIS prediction. This threshold setting performed best over a range of values tried (data not shown). Finally, the TISHunter and ATGpr programs, available at <url>http://bioinfo.ucr.edu/~hli/</url> and <url>http://flj.hinv.jp/ATGpr/atgpr/index.html</url>, respectively, were used with default settings. All raw output generated by these tools on our test data is available as supplementary information at <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>.</p>
            <p>As depicted in Table <tblr tid="T1">1</tblr>, PFCWLLKR handily outperforms the NetStart system, though it is bested by the TIS Miner (albeit by a slight margin), TISHunter and ATGpr programs on these TIS-containing instances. In no case are the competing programs able to outperform 1<sup><it>st</it></sup>-ATG. Table <tblr tid="T2">2</tblr> demonstrates that PFCWLLKR is considerably better than the competing methods at identifying a true TIS when an in-frame site occurs upstream from it, however. Finally, Table <tblr tid="T3">3</tblr> shows that PFCWLLKR is far better at declining to predict a TIS when none are present than any of the four competing programs.</p>
         </sec>
         <sec>
            <st>
               <p>Performance gains by parameter set indexing</p>
            </st>
            <p>Based on the results shown in Tables <tblr tid="T1">1</tblr>, <tblr tid="T2">2</tblr> and <tblr tid="T3">3</tblr>, we decided to focus on the PFCWLLKR method in the following. Indeed, although we assessed all the methods in the experiments described below, PFCWLLKR was superior in all cases (data not shown).</p>
            <p>As shown in Tables <tblr tid="T1">1</tblr> and <tblr tid="T3">3</tblr>, all parametric methods exhibited an increase in successful TIS identification when using stratified parameter sets, suggesting that considerable improvements in statistically-based models for TIS prediction can be achieved by taking the appropriately defined class of each potential start-methionine into account. This motivated the development of a lookup method for indexing appropriate parameter sets when the class of a test sequence's true TIS is not known beforehand. Table <tblr tid="T4">4</tblr> provides results obtained using PFCWLLKR on TIS-containing data, under the six parameter indexing schemes described in the <b>Implementation </b>subsection <b>Stratified training and testing</b>. For any given value of <it>k </it>&#8712; {3, 5, 10}, static WAM-based indexing performs best overall. Additionally, increasing values of <it>k </it>resulted in increased performance on these data. All indexing approaches improved under static parameter lookup relative to modulating, for all deployment strategies. This can be explained given the observation that in-frame ATGs upstream from true TISs are relatively rare, e.g., they occur with a frequency of roughly 1,150/19,703 &#8776;6% in <it>Arabidopsis</it>, and thus, provided the similarity measure used can recover the site's corresponding class with good fidelity, performance should closely approximate that obtained under a <it>priori</it>-known cluster-specific parameter usage.</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Effect of parameter set indexing strategy on PFCWLLKR performance using TIS-containing data</p>
               </caption>
               <tblbdy cols="8">
                  <r>
                     <c ca="center">
                        <p>
                           <it>k</it>
                        </p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>Indexing strategy</p>
                     </c>
                     <c ca="right">
                        <p>TP</p>
                     </c>
                     <c ca="right">
                        <p>FP</p>
                     </c>
                     <c ca="right">
                        <p>FN</p>
                     </c>
                     <c ca="right">
                        <p>Sn</p>
                     </c>
                     <c ca="right">
                        <p>Sp</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>modulating</p>
                     </c>
                     <c ca="center">
                        <p>edit</p>
                     </c>
                     <c ca="right">
                        <p>14,395</p>
                     </c>
                     <c ca="right">
                        <p>4,944</p>
                     </c>
                     <c ca="right">
                        <p>364</p>
                     </c>
                     <c ca="right">
                        <p>0.7306</p>
                     </c>
                     <c ca="right">
                        <p>0.7444</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>PWM</p>
                     </c>
                     <c ca="right">
                        <p>14,270</p>
                     </c>
                     <c ca="right">
                        <p>5,020</p>
                     </c>
                     <c ca="right">
                        <p>413</p>
                     </c>
                     <c ca="right">
                        <p>0.7243</p>
                     </c>
                     <c ca="right">
                        <p>0.7398</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>WAM</p>
                     </c>
                     <c ca="right">
                        <p>14,388</p>
                     </c>
                     <c ca="right">
                        <p>4,949</p>
                     </c>
                     <c ca="right">
                        <p>366</p>
                     </c>
                     <c ca="right">
                        <p>0.7302</p>
                     </c>
                     <c ca="right">
                        <p>0.7441</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>static</p>
                     </c>
                     <c ca="center">
                        <p>edit</p>
                     </c>
                     <c ca="right">
                        <p>15,895</p>
                     </c>
                     <c ca="right">
                        <p>3,157</p>
                     </c>
                     <c ca="right">
                        <p>651</p>
                     </c>
                     <c ca="right">
                        <p>0.8067</p>
                     </c>
                     <c ca="right">
                        <p>0.8343</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>PWM</p>
                     </c>
                     <c ca="right">
                        <p>15,757</p>
                     </c>
                     <c ca="right">
                        <p>3,226</p>
                     </c>
                     <c ca="right">
                        <p>720</p>
                     </c>
                     <c ca="right">
                        <p>0.7997</p>
                     </c>
                     <c ca="right">
                        <p>0.8301</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>WAM</p>
                     </c>
                     <c ca="right">
                        <p>15,916</p>
                     </c>
                     <c ca="right">
                        <p>3,158</p>
                     </c>
                     <c ca="right">
                        <p>629</p>
                     </c>
                     <c ca="right">
                        <p>0.8078</p>
                     </c>
                     <c ca="right">
                        <p>0.8344</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>modulating</p>
                     </c>
                     <c ca="center">
                        <p>edit</p>
                     </c>
                     <c ca="right">
                        <p>13,753</p>
                     </c>
                     <c ca="right">
                        <p>5,540</p>
                     </c>
                     <c ca="right">
                        <p>410</p>
                     </c>
                     <c ca="right">
                        <p>0.6980</p>
                     </c>
                     <c ca="right">
                        <p>0.7128</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>PWM</p>
                     </c>
                     <c ca="right">
                        <p>13,856</p>
                     </c>
                     <c ca="right">
                        <p>5,501</p>
                     </c>
                     <c ca="right">
                        <p>346</p>
                     </c>
                     <c ca="right">
                        <p>0.7032</p>
                     </c>
                     <c ca="right">
                        <p>0.7158</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>WAM</p>
                     </c>
                     <c ca="right">
                        <p>14,208</p>
                     </c>
                     <c ca="right">
                        <p>5,267</p>
                     </c>
                     <c ca="right">
                        <p>228</p>
                     </c>
                     <c ca="right">
                        <p>0.7211</p>
                     </c>
                     <c ca="right">
                        <p>0.7296</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>static</p>
                     </c>
                     <c ca="center">
                        <p>edit</p>
                     </c>
                     <c ca="right">
                        <p>15,908</p>
                     </c>
                     <c ca="right">
                        <p>2,781</p>
                     </c>
                     <c ca="right">
                        <p>1,014</p>
                     </c>
                     <c ca="right">
                        <p>0.8074</p>
                     </c>
                     <c ca="right">
                        <p>0.8512</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>PWM</p>
                     </c>
                     <c ca="right">
                        <p>16,080</p>
                     </c>
                     <c ca="right">
                        <p>2,704</p>
                     </c>
                     <c ca="right">
                        <p>919</p>
                     </c>
                     <c ca="right">
                        <p>0.8161</p>
                     </c>
                     <c ca="right">
                        <p>0.8560</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>WAM</p>
                     </c>
                     <c ca="right">
                        <p>16,454</p>
                     </c>
                     <c ca="right">
                        <p>2,634</p>
                     </c>
                     <c ca="right">
                        <p>615</p>
                     </c>
                     <c ca="right">
                        <p>0.8351</p>
                     </c>
                     <c ca="right">
                        <p>0.8620</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>modulating</p>
                     </c>
                     <c ca="center">
                        <p>edit</p>
                     </c>
                     <c ca="right">
                        <p>12,849</p>
                     </c>
                     <c ca="right">
                        <p>6,364</p>
                     </c>
                     <c ca="right">
                        <p>490</p>
                     </c>
                     <c ca="right">
                        <p>0.6521</p>
                     </c>
                     <c ca="right">
                        <p>0.6688</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>PWM</p>
                     </c>
                     <c ca="right">
                        <p>13,861</p>
                     </c>
                     <c ca="right">
                        <p>5,647</p>
                     </c>
                     <c ca="right">
                        <p>195</p>
                     </c>
                     <c ca="right">
                        <p>0.7035</p>
                     </c>
                     <c ca="right">
                        <p>0.7105</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>WAM</p>
                     </c>
                     <c ca="right">
                        <p>14,169</p>
                     </c>
                     <c ca="right">
                        <p>5,422</p>
                     </c>
                     <c ca="right">
                        <p>112</p>
                     </c>
                     <c ca="right">
                        <p>0.7191</p>
                     </c>
                     <c ca="right">
                        <p>0.7232</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>static</p>
                     </c>
                     <c ca="center">
                        <p>edit</p>
                     </c>
                     <c ca="right">
                        <p>15,729</p>
                     </c>
                     <c ca="right">
                        <p>2,441</p>
                     </c>
                     <c ca="right">
                        <p>1,533</p>
                     </c>
                     <c ca="right">
                        <p>0.7983</p>
                     </c>
                     <c ca="right">
                        <p>0.8657</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>PWM</p>
                     </c>
                     <c ca="right">
                        <p>16,755</p>
                     </c>
                     <c ca="right">
                        <p>2,135</p>
                     </c>
                     <c ca="right">
                        <p>813</p>
                     </c>
                     <c ca="right">
                        <p>0.8504</p>
                     </c>
                     <c ca="right">
                        <p>0.8870</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>WAM</p>
                     </c>
                     <c ca="right">
                        <p>17,156</p>
                     </c>
                     <c ca="right">
                        <p>2,013</p>
                     </c>
                     <c ca="right">
                        <p>534</p>
                     </c>
                     <c ca="right">
                        <p>0.8707</p>
                     </c>
                     <c ca="right">
                        <p>0.8950</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>19,703 TIS-containing instances were used in five-fold cross-validation experiments, in which parameter sets were selected for putative TIS evaluation according to best cluster fit established by either the Hamming distance relative to cached medoids (edit), position weight matrix scores (PWM), or weight array matrix scores (WAM). Parameter indexing was tested under both modulating (cluster assignment for each site separately) and static (cluster assignment based on the leftmost ATG) approaches. <it>k </it>denotes the number of clusters considered. <it>TP </it>represents the number of instances for which the method correctly identified a TIS; <it>FP </it>for which a prediction was made, though incorrect; and <it>FN </it>for which no prediction was made, but should have been (see Figure 2). <inline-formula><m:math name="1471-2105-9-381-i5" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>S</m:mi><m:mi>n</m:mi><m:mo>=</m:mo><m:mfrac><m:mrow><m:mi>T</m:mi><m:mi>P</m:mi></m:mrow><m:mrow><m:mi>T</m:mi><m:mi>P</m:mi><m:mo>+</m:mo><m:mi>F</m:mi><m:mi>P</m:mi><m:mo>+</m:mo><m:mi>F</m:mi><m:mi>N</m:mi></m:mrow></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaem4uamLaemOBa4Maeyypa0tcfa4aaSaaaeaacqWGubavcqWGqbauaeaacqWGubavcqWGqbaucqGHRaWkcqWGgbGrcqWGqbaucqGHRaWkcqWGgbGrcqWGobGtaaaaaa@3AFD@</m:annotation></m:semantics></m:math></inline-formula>, and <inline-formula><m:math name="1471-2105-9-381-i6" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>S</m:mi><m:mi>p</m:mi><m:mo>=</m:mo><m:mfrac><m:mrow><m:mi>T</m:mi><m:mi>P</m:mi></m:mrow><m:mrow><m:mi>T</m:mi><m:mi>P</m:mi><m:mo>+</m:mo><m:mi>F</m:mi><m:mi>P</m:mi></m:mrow></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaem4uamLaemiCaaNaeyypa0tcfa4aaSaaaeaacqWGubavcqWGqbauaeaacqWGubavcqWGqbaucqGHRaWkcqWGgbGrcqWGqbauaaaaaa@37E5@</m:annotation></m:semantics></m:math></inline-formula>.</p>
               </tblfn>
            </tbl>
            <p>Table <tblr tid="T5">5</tblr> presents results analogous to Table <tblr tid="T4">4</tblr>, for non-TIS-containing data. Again, static parameter lookup yielded results superior to those obtained under the modulating approach. Increases in <it>k </it>typically resulted in a greater number of false positive predictions being made, resulting in progressively lower performance.</p>
            <tbl id="T5">
               <title>
                  <p>Table 5</p>
               </title>
               <caption>
                  <p>Effect of parameter set indexing strategy on PFCWLLKR performance using non-TIS-containing data</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="center">
                        <p>
                           <it>k</it>
                        </p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>Indexing strategy</p>
                     </c>
                     <c ca="right">
                        <p>TN</p>
                     </c>
                     <c ca="right">
                        <p>FP</p>
                     </c>
                     <c ca="right">
                        <p>Sn</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>modulating</p>
                     </c>
                     <c ca="center">
                        <p>edit</p>
                     </c>
                     <c ca="right">
                        <p>5,074</p>
                     </c>
                     <c ca="right">
                        <p>11,047</p>
                     </c>
                     <c ca="right">
                        <p>0.3147</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>PWM</p>
                     </c>
                     <c ca="right">
                        <p>5,134</p>
                     </c>
                     <c ca="right">
                        <p>10,987</p>
                     </c>
                     <c ca="right">
                        <p>0.3185</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>WAM</p>
                     </c>
                     <c ca="right">
                        <p>5,069</p>
                     </c>
                     <c ca="right">
                        <p>11,052</p>
                     </c>
                     <c ca="right">
                        <p>0.3144</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>static</p>
                     </c>
                     <c ca="center">
                        <p>edit</p>
                     </c>
                     <c ca="right">
                        <p>6,170</p>
                     </c>
                     <c ca="right">
                        <p>9,951</p>
                     </c>
                     <c ca="right">
                        <p>0.3827</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>PWM</p>
                     </c>
                     <c ca="right">
                        <p>6,279</p>
                     </c>
                     <c ca="right">
                        <p>9,842</p>
                     </c>
                     <c ca="right">
                        <p>0.3895</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>WAM</p>
                     </c>
                     <c ca="right">
                        <p>6,119</p>
                     </c>
                     <c ca="right">
                        <p>10,002</p>
                     </c>
                     <c ca="right">
                        <p>0.3796</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>modulating</p>
                     </c>
                     <c ca="center">
                        <p>edit</p>
                     </c>
                     <c ca="right">
                        <p>4,537</p>
                     </c>
                     <c ca="right">
                        <p>11,584</p>
                     </c>
                     <c ca="right">
                        <p>0.2814</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>PWM</p>
                     </c>
                     <c ca="right">
                        <p>4,484</p>
                     </c>
                     <c ca="right">
                        <p>11,637</p>
                     </c>
                     <c ca="right">
                        <p>0.2781</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>WAM</p>
                     </c>
                     <c ca="right">
                        <p>4,262</p>
                     </c>
                     <c ca="right">
                        <p>11,859</p>
                     </c>
                     <c ca="right">
                        <p>0.2644</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>static</p>
                     </c>
                     <c ca="center">
                        <p>edit</p>
                     </c>
                     <c ca="right">
                        <p>5,993</p>
                     </c>
                     <c ca="right">
                        <p>10,128</p>
                     </c>
                     <c ca="right">
                        <p>0.3718</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>PWM</p>
                     </c>
                     <c ca="right">
                        <p>6,065</p>
                     </c>
                     <c ca="right">
                        <p>10,056</p>
                     </c>
                     <c ca="right">
                        <p>0.3762</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>WAM</p>
                     </c>
                     <c ca="right">
                        <p>5,679</p>
                     </c>
                     <c ca="right">
                        <p>10,442</p>
                     </c>
                     <c ca="right">
                        <p>0.3523</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>modulating</p>
                     </c>
                     <c ca="center">
                        <p>edit</p>
                     </c>
                     <c ca="right">
                        <p>4,190</p>
                     </c>
                     <c ca="right">
                        <p>11,931</p>
                     </c>
                     <c ca="right">
                        <p>0.2599</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>PWM</p>
                     </c>
                     <c ca="right">
                        <p>3,708</p>
                     </c>
                     <c ca="right">
                        <p>12,413</p>
                     </c>
                     <c ca="right">
                        <p>0.2300</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>WAM</p>
                     </c>
                     <c ca="right">
                        <p>3,533</p>
                     </c>
                     <c ca="right">
                        <p>12,588</p>
                     </c>
                     <c ca="right">
                        <p>0.2192</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>static</p>
                     </c>
                     <c ca="center">
                        <p>edit</p>
                     </c>
                     <c ca="right">
                        <p>6,345</p>
                     </c>
                     <c ca="right">
                        <p>9,776</p>
                     </c>
                     <c ca="right">
                        <p>0.3936</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>PWM</p>
                     </c>
                     <c ca="right">
                        <p>5,537</p>
                     </c>
                     <c ca="right">
                        <p>10,584</p>
                     </c>
                     <c ca="right">
                        <p>0.3435</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>WAM</p>
                     </c>
                     <c ca="right">
                        <p>5,199</p>
                     </c>
                     <c ca="right">
                        <p>10,922</p>
                     </c>
                     <c ca="right">
                        <p>0.3225</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>16,121 non-TIS-containing instances were used in five-fold cross-validation experiments, in which parameter sets were selected for putative TIS evaluation according to best cluster fit established by either the Hamming distance relative to cached medoids (edit), position weight matrix scores (PWM), or weight array matrix scores (WAM). Parameter indexing was tested under both modulating (cluster assignment for each site separately) and static (cluster assignment based on the leftmost ATG) approaches. <it>k </it>denotes the number of clusters considered. <it>TN </it>represents the number of instances for which the method (correctly) refused to predict a TIS, and <it>FP </it>the number for which some prediction was made, though always incorrect (see Figure 2). <inline-formula><m:math name="1471-2105-9-381-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>S</m:mi><m:mi>n</m:mi><m:mo>=</m:mo><m:mfrac><m:mrow><m:mi>T</m:mi><m:mi>N</m:mi></m:mrow><m:mrow><m:mi>T</m:mi><m:mi>N</m:mi><m:mo>+</m:mo><m:mi>F</m:mi><m:mi>P</m:mi></m:mrow></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaem4uamLaemOBa4Maeyypa0tcfa4aaSaaaeaacqWGubavcqWGobGtaeaacqWGubavcqWGobGtcqGHRaWkcqWGgbGrcqWGqbauaaaaaa@37D9@</m:annotation></m:semantics></m:math></inline-formula>.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>MetWAMer as a TIS classifier</p>
            </st>
            <p>Although MetWAMer is not a TIS classifier <it>per se</it>, each TIS prediction method utilizes some form of discriminant technique with which to evaluate whether the best-scoring in-frame ATG (a putative TIS) is a true or false site. Figure <figr fid="F3">3</figr> shows receiver operating characteristic (ROC) curves for the sigmoidal perceptron element of PFCWLLKR, which was assayed on the task of labeling ATG codons as true or false TISs under distinct parameter deployment strategies. Five-fold cross-validation was used to classify 34,229 instances, 19,703 of which were known TISs from the TIS-containing gene set and 14,526 of which were the first in-frame ATG codons (false TISs) from the non-TIS-containing set (the negative instances number fewer than the 16,121 instances used in Table <tblr tid="T3">3</tblr> because 1,595 of the truncated, multiple-exon gene structures lacked any in-frame ATG). The ROC plots demonstrate that utilization of a <it>priori</it>-known cluster-specific parameter sets yields a classifier superior to that obtained using a single, homogeneous set. However, WAM-based indexing yielded a classifier worse than both the others. This seems due to the comparatively worse performance of the parameter set lookup strategies in general at rejecting false TISs (e.g., compare PFCWLLKR performance results in Table <tblr tid="T3">3</tblr> with those in Table <tblr tid="T5">5</tblr>).</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Receiver operating characteristic curves for the perceptron element of PFCWLLKR</p>
               </caption>
               <text>
                  <p><b>Receiver operating characteristic curves for the perceptron element of PFCWLLKR</b>. The classifier was assessed on the task of distinguishing ATG codons as true or false TISs, under distinct parameter deployment strategies: the dotted curve denotes perceptron performance obtained under <it>a priori</it>-known cluster-specific parameter usage, the solid curve that from homogeneous parameter deployment, and the dashed curve from WAM-based parameter set indexing. A true positive is defined as a true TIS labeled as such, whereas a false positive denotes a false TIS labeled by the classifier as true. These plots were generated using the ROCR package <abbrgrp><abbr bid="B62">62</abbr></abbrgrp>.</p>
               </text>
               <graphic file="1471-2105-9-381-3"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Biological interpretation of TIS classes</p>
            </st>
            <p>Given the improved performance of the methods under the <it>a priori</it>-known cluster-specific parameter deployment strategy, we wondered if any underlying biological basis for the grouping obtained by <it>k</it>-medoids, for <it>k </it>= 3, may exist. Clustering was performed using a non-redundant set of TISs, and the cluster-specific consensus sequences derived from position-specific mononucleotide distributions perfectly recovered each cluster's associated medoid (see Figure <figr fid="F4">4</figr>). While these consensus sequences were fairly weak &#8211; as evidenced by the high degree of entropy at each position in the TIS alignment &#8211; the observation nevertheless indicates that the clustering algorithm's results are meaningful, and also suggests the possibility of at least three distinct groups of TISs in <it>Arabidopsis</it>. The possibility that these could correspond to distinct gene classes was explored using a non-parametric statistical test on ontological annotations, in which the significance of cluster-specific distributions of GOslim (specifically, cellular component) terms <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> was determined by sampling same-size sets from the full population of terms. We labeled a cluster as being over- or underrepresented with respect to a particular GOslim term if its frequency in the class was in the top or bottom five values in comparison with 99 randomly-sampled sets, respectively. Clusters 1 through 3 contained 6,298; 5,039; and 3,019 instances having associated GOslim terms, respectively, with the overall population containing 14,356 terms. Our results, presented in Table <tblr tid="T6">6</tblr>, suggest that cluster 1 is largely depleted of plastid and ribosomal genes, while cluster 2 is enriched for these; cluster 3 is enriched for plastid and cytosolic genes. However, these observations should perhaps be deemed inconclusive, as many genes in our data set do not yet have associated GOslim terms, and for those that did, such annotations should typically be considered tenuous at present.</p>
            <tbl id="T6">
               <title>
                  <p>Table 6</p>
               </title>
               <caption>
                  <p>Cluster-specific over- and underrepresentation of GOslim terms</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="left">
                        <p>GOslim term</p>
                     </c>
                     <c ca="left">
                        <p>High</p>
                     </c>
                     <c ca="left">
                        <p>Low</p>
                     </c>
                     <c ca="left">
                        <p>Normal</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>cell wall</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1,2,3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>chloroplast</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>cytosol</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>ER</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1,2,3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>extracellular</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>1,2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Golgi apparatus</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1,2,3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mitochondria</p>
                     </c>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>nucleus</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>2,3</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>other cellular components</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1,2,3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>other cytoplasmic components</p>
                     </c>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1,3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>other intracellular components</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1,2,3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>other membranes</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1,2,3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>plasma membrane</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1,2,3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>plastid</p>
                     </c>
                     <c ca="left">
                        <p>2,3</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>ribosome</p>
                     </c>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>unknown cellular components</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>2,3</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p><it>Arabidopsis </it>transcripts were clustered into three sequence clusters based on TIS similarity. Within these clusters, the numbers of gene models with associated GOslim terms are, respectively, 6,298, 5,039, and 3,019. Clusters denoted as being "High" for a specific term were determined to be enriched in genes labeled as such, relative to the full population of terms, by a randomization test. Those labeled "Low" were found to be relatively impoverished in genes labeled with the associated term, and "Normal" as being neither significantly over- nor underrepresented.</p>
               </tblfn>
            </tbl>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Cluster-specific TIS mononucleotide distributions</p>
               </caption>
               <text>
                  <p><b>Cluster-specific TIS mononucleotide distributions</b>. Sequence logo plots <abbrgrp><abbr bid="B63">63</abbr></abbrgrp>, depicting site-specific nucleotide abundances, were generated for TIS sequences obtained from clusters 1 through 3 using the WebLogo utility <abbrgrp><abbr bid="B64">64</abbr></abbrgrp>. The medoids computed by the <it>k</it>-medoids algorithm for clusters 1 through 3 are TAAAAATGGAT, AAAAAATGGCG, and CAACAATGGCT, respectively.</p>
               </text>
               <graphic file="1471-2105-9-381-4"/>
            </fig>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>Our results on the TIS-containing data set suggest that, compared with the methods implemented in MetWAMer, a policy of labeling the first ATG as TIS in a maximal ORF wil achieve quite good (though imperfect) results. However, in practice we cannot always assert whether a maximal ORF has sufficient 5'-coverage so as to include the gene's true TIS, or whether a spurious in-frame ATG occurs upstream from it. In such cases, the 1<sup><it>st</it></sup>-ATG strategy fails, as it does in cases of leaky scanning, thus sustaining the importance of further development of statistical TIS prediction methodologies that capture the sequence features recognized by the ribosome in translation initiation. In this work, we present a number of distinct models for TIS prediction, the most successful of which mixes content- and signal-based features of putative TISs using a perceptron (PFCWLLKR). Furthermore, we demonstrate that, in the model plant <it>Arabidopsis</it>, TIS prediction can be enhanced by integration of class-specific parameter sets, regardless of the prediction method utilized.</p>
         <p>We attribute the well-balanced performance of PFCWLLKR to the biological plausibility of the features provided to it as inputs. As a signal-based feature, weighted log-likelihood ratios considerably improve the specificity of TIS prediction (e.g., contrast WLLKR and LLKR in Tables <tblr tid="T1">1</tblr> and <tblr tid="T2">2</tblr>, likely because our weighting function, <it>w</it>(<it>x</it>) = <it>x</it><sup>3 </sup>for induced protein length to maximal ORF coverage <it>x</it>, appears to empirically approximate the epistemology of eukaryotic translation initiation fairly well: according to the (leaky) ribosomal scanning hypothesis <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, one would expect that more upstream AUG sites &#8211; especially those occurring in a favorable signaling context &#8211; in a maximal reading frame would be more likely to function as <it>bona fide </it>translation initiation sites. Also, it is unusual for a long, uninterrupted reading frame to be maintained, yet not expressed as part of a functional protein product. Our weighting scheme has been explicitly designed to reflect these biologically-informed biases.</p>
         <p>During the post-scanning phase of translation initiation, the small ribosomal subunit stalls at a TIS to recruit the large subunit, thereby forming the 80S ribosomal particle. The scanning process, as conducted by the small ribosomal subunit in concert with various eukaryotic initiation factors, does not appear to take more global nucleotide compositional features of the mRNA molecule into account, notwithstanding the possibility of secondary structures causing steric interference with scanning itself. That we might utilize contrast in coding potential of sequences flanking a TIS for modeling purposes is a consequence of the fact that sequences upstream of a TIS are non-coding, and those downstream, coding, though this plays no known role in the recognition of TISs <it>in vivo</it>. The use of Markov chains in a classification setting was shown to distinguish exons from introns with good accuracy in plant systems <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>, and our expectation that these content-sensing tools could be gainfully transferred to the TIS prediction domain was born out by the performance results shown. Similar inclusion of coding potential contrast has also been employed to increase splice site prediction accuracy <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B45">45</abbr></abbrgrp>.</p>
         <p>Our data set was developed from gene models flagged as curated in the current <it>Arabidopsis </it>annotations, though it should not be overlooked that potential errors in these structures might have distorted our results. Manual inspection of several genes whose TISs were predicted incorrectly by the PFCWLLKR routine indicate possible problems with existing annotations. For example, in gene model At4g34080.1 <url>http://www.plantgdb.org/AtGDB-cgi/getRegion.pl?dbid=2&amp;chr=4&amp;l_pos=16326388&amp;r_pos=16328548</url>, our system predicted the TIS as that from the TAIR version 6 gene annotation, rather than that of version 7, which occurs downstream. Similarly, we predict the version 6 TIS of gene model At5g35580.1 <url>http://www.plantgdb.org/AtGDB-cgi/getRegion.pl?dbid=2&amp;chr=5&amp;l_pos=13778674&amp;r_pos=13781581</url> as correct, rather than the revised TIS from the version 7 model. Partial protein sequencing using Edman degradation could potentially resolve such ambiguities in the annotations (e.g., <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>), as might consideration of homologous proteins with matching N-termini whose translation initiations sites had previously been determined; such efforts are beyond the scope of this work, however.</p>
         <p>Although we were unable to achieve the performance levels of <it>a priori</it>-known cluster-specific parameter deployment with our parameter set indexing schemes, stratified parameter deployment can nevertheless be used effectively in practice, pending certain characteristics of the test data: if these are expected to be moderately enriched for 5'-complete sequences, then static WAM-based indexing should recover a larger fraction of true TISs than would homogeneous deployment. However, if complete 5'-coverage is expected to be quite sparse, homogeneous parameter deployment should be utilized instead. This affords a complete prescription of how to most effectively identify TISs in transcript data: 1<sup><it>st</it></sup>-ATG would be the best method for use in data sets with a high degree of 5'-completeness, static WAM-based PFCWLLKR in moderately enriched data sets, and homogeneous deployment of PFCWLLKR in data sets likely to contain few 5'-complete sequences.</p>
         <p>We have replicated our experiments using a data set based on the most recent GenBank annotations for the nematode <it>Caenorhabditis elegans </it>(dated 16 February 2006), the results of which are similar to those presented here for <it>Arabidopsis </it>(available as supplementary material at <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>), suggesting that our method is not specific to plant taxa, and can be used for eukaryotic TIS prediction in general. Also available as supplementary material are homogeneous parameter deployment-based results for a small set of TIS-containing human genes culled from the Consensus CDS project <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>; these results imply that the system can be utilized for vertebrate taxa, as well.</p>
         <p>As a demonstration of MetWAMer's applicability for post-processing gene structures predicted by separate tools, we refined maize gene annotations generated by the GeneSeqer spliced alignment program <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>. 11,742 full length maize cDNA sequences were obtained from the Maize Full Length cDNA project <abbrgrp><abbr bid="B48">48</abbr></abbrgrp> and aligned via GeneSeqer to a set of 17,163 BAC sequences downloaded from PlantGDB <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>. These results were post-processed with MetWAMer's PFCWLLKR routine under homogeneous parameter deployment, using parameters trained with <it>Arabidopsis </it>data. We considered only predicted protein sequences such that at least one full length cDNA supporting its annotation exhibited an overall GeneSeqer alignment score of at least 0.9 and the predicted TIS occurred in or upstream from the first exon identified by spliced alignment. The resulting set of 6,926 proteins was aligned against a collection of 36,338 annotated sorghum proteins downloaded from the Phytozome project <abbrgrp><abbr bid="B50">50</abbr></abbrgrp> using BLASTP. BLASTP output was inspected using the MuSeqBox program <abbrgrp><abbr bid="B51">51</abbr></abbrgrp> in order to select only those inferred maize proteins of at least 150 amino acids in length whose best hit in the sorghum data, also at least 150 amino acids long, shared high-scoring segment pairs (HSPs) of at least 20% identity apiece such that the sum of these non-overlapping HSPs was not less than 90% of the length of either sequence. Furthermore, at most five amino acids at both the N-and C-termini, for both sequences, were allowed to be disjoint from an HSP. These 2,315 proteins were then made non-redundant using BLASTClust with default settings. In summary, the resultant set of 1,665 maize proteins on 1,463 distinct BACs identifed by GeneSeqer in concert with MetWAMer represents a reliable collection of high-quality, non-redundant full length maize proteins that could not have been identified by GeneSeqer alone, thereby demonstrating the practical utility of this approach to modern genome annotation projects. Our results are available as supplementary data at <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>.</p>
         <p>We compared annotation results of our pipeline with those achieved by a current state-of-the-art <it>ab initio </it>gene prediction tool, AUGUSTUS <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>. The BAC sequences containing our annotated maize genes were fed to the program and processed using its maize-specific parameters. We note that a fair comparison between the two approaches is basically impossible, since the search space probed by pure <it>ab initio </it>gene finders is quite distinct from that explored by spliced alignment annotation systems such as GeneSeqer+MetWAMer, so we disregard false positive predictions generated by AUGUSTUS. In summary, of the 1,665 maize proteins we identified, AUGUSTUS correctly predicted 1,232 (&#8776;74%) TISs and 581 (&#8776;35%) complete gene structures. These results underscore the necessity that a complete and robust gene annotation pipeline should integrate evidence from multiple data sources, gene prediction software and even manual gene curation results, as is achieved by various higher-order systems including AUGUSTUS+ <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>, the Ensembl pipeline <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>, EuG&#233;ne <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>, and JigSaw <abbrgrp><abbr bid="B56">56</abbr><abbr bid="B57">57</abbr></abbrgrp>. Our efforts to integrate a variety of retrained, state-of-the-art gene finding tools using such systems in the context of various plant genomes will be presented in a forthcoming report.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>MetWAMer performance results, particularly for PFCWLLKR, suggest that the method can be used with good success for the task of annotating TISs in eukaryotes. However, our data are not precisely comparable with those provided by a number of previous studies, e.g., <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B15">15</abbr><abbr bid="B19">19</abbr><abbr bid="B33">33</abbr></abbrgrp>, just as results between those papers are essentially incomparable, as well. This is due to differing experimental designs (some studies focus on the number of ATG codons correctly classified as true or false TISs, and others on the number of genes for which the TIS was correctly identified) and different data sets (some studies used human genes, some cyanobacterial, etc., and these corpora were often of very different sizes).</p>
         <p>Comparing these published methods with our own, using our data and experimental design, was often not practical: the availability of software implementing methods developed for eukaryotic TIS prediction <it>per se </it>is very limited at present. Among the papers addressing intrinsic TIS detection methods, only the ATGpr <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, StartScan <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>, DIANA-TIS <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, TISHunter <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, NetStart <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, and TIS Miner <abbrgrp><abbr bid="B43">43</abbr></abbrgrp> systems are described as "available" software. We were only able to utilize the NetStart, TIS Miner, TISHunter and ATGpr systems to compare against our software system, though we note that it is impossible to re-train any of these programs. StartScan is available via a web interface (currently trained only for human), but an important distinction from our tool is that StartScan is for TIS recognition in genomic sequences, a much different task than that addressed by MetWAMer. Although not mentioned in its reference paper, <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, we were able to locate a web interface to the DIANA-TIS system at the author's web page <url>http://diana.pcbi.upenn.edu</url>. However, documentation for the interface is unavailable, and most prohibitive is that it only allows a pictorial representation of its predictions, which is unrealistic for processing data sets of the scale used in this study. GeneHackerTL is mentioned in <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, but it is not described as being publicly available, nor were we able to locate it in any web-accessible forum.</p>
         <p>The paucity of freely available, functioning programs for TIS prediction comprises an important gap in the software infrastructure for computational biology. Our MetWAMer package represents a well-documented, extensible, and open source software system that can be modified for differing applications and extended with existing and novel TIS prediction methods to support further research in this area; this is, to our knowledge, the first such contribution made to the eukaryotic TIS prediction community at-large. There are certain limitations to the existing scope of MetWAMer, however, which may present opportunities for future work. We have explicitly ignored the possibility of non-AUG start codons, although these are known to occur in various eukaryotic organisms <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>. Also, the system does not explicitly integrate extrinsic information, such as homologous proteins, which is reportedly successful <abbrgrp><abbr bid="B58">58</abbr></abbrgrp>; however, due to evolutionary forces operating on homologous genes, it is possible that translation initiation sites differ, and the use of such information for prediction could be misleading. We have explicitly ignored the possibility of translation initiation proceeding by a re-initiation mechanism, whereby a short ORF upstream of the more significant ORF is translated, and the ribosome resumes translation at a downstream AUG <abbrgrp><abbr bid="B59">59</abbr></abbrgrp>. For MetWAMer, however, this is not a potentially obfuscating phenomena: because the system scans for TISs in a maximal reading frame, there is no possibility to predict a start codon upstream of the significant ORF that is succeeded by a stop codon a short distance thereafter. Another open problem is the prediction of alternative TISs in various gene structures <abbrgrp><abbr bid="B60">60</abbr></abbrgrp>.</p>
         <p>The ability to train TIS models in a species-specific manner is an important strength of MetWAMer, because differences in translation initiation processes among distinct taxa are known to occur <abbrgrp><abbr bid="B61">61</abbr></abbrgrp>. To the extent that cross-specific TISs are representative of some target species, these could in principle be used as a proxy if species-specific data are not available; the performance of our system in such a scenario will be reported in a forthcoming study in which we refine gene structure annotations of a variety of cereal crop genomes. Results presented here also indicate that improvements in TIS prediction accuracy are possible when taking the class of potential start-methionines into account. Our software readily accommodates these needs, and can be integrated into other gene annotation programs and/or pipelines with straightforward modifications.</p>
      </sec>
      <sec>
         <st>
            <p>Availability and requirements</p>
         </st>
         <p>&#8226; Project name: MetWAMer</p>
         <p>&#8226; Project home page: <url>http://brendelgroup.org/SB08B/</url></p>
         <p>&#8226; Operating system(s): Platform independent</p>
         <p>&#8226; Programming language: C</p>
         <p>&#8226; Other requirements: libxml2 version 2-6-23 or later <url>http://www.xmlsoft.org</url>, and IMMpractical version 1.0 or later <url>http://sourceforge.net/projects/immpractical/</url> &#8211; see the MetWAMer manual page for details.</p>
         <p>&#8226; License: ISC license</p>
         <p>&#8226; Restrictions to use by non-academics: None</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>VB suggested the project and advised on the models, experimental design, and manuscript. MES co-designed the models with VB, implemented the software, conducted experiments, and wrote the manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This work was supported in part by NSF Grant DBI-0606909. We thank three anonymous reviewers whose comments improved this manuscript.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>How do eucaryotic ribosomes select initiation regions in messenger RNA?</p>
            </title>
            <aug>
               <au>
                  <snm>Kozak</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1978</pubdate>
            <volume>15</volume>
            <fpage>1109</fpage>
            <lpage>1123</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">215319</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Starting the protein synthesis machine: eukaryotic translation initiation</p>
            </title>
            <aug>
               <au>
                  <snm>Preiss</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Hentze</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>BioEssays</source>
            <pubdate>2003</pubdate>
            <volume>25</volume>
            <fpage>1201</fpage>
            <lpage>1211</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">14635255</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>An analysis of 5'-noncoding sequences from 699 vertebrate messenger RNAs</p>
            </title>
            <aug>
               <au>
                  <snm>Kozak</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>1987</pubdate>
            <volume>15</volume>
            <fpage>8125</fpage>
            <lpage>8148</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">306349</pubid>
                  <pubid idtype="pmpid" link="fulltext">3313277</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Starting at the beginning, middle, and end: translation initiation in eukaryotes</p>
            </title>
            <aug>
               <au>
                  <snm>Sachs</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sarnow</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Hentze</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1997</pubdate>
            <volume>89</volume>
            <fpage>831</fpage>
            <lpage>838</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9200601</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Oscillating kissing stem-loop interactions mediate 5' scanning-dependent translation by a viral 3'-cap-independent translation element</p>
            </title>
            <aug>
               <au>
                  <snm>Rakotondrafara</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Polacek</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Harris</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>RNA</source>
            <pubdate>2006</pubdate>
            <volume>12</volume>
            <fpage>1893</fpage>
            <lpage>1906</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1581982</pubid>
                  <pubid idtype="pmpid" link="fulltext">16921068</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Translational control of retroviruses</p>
            </title>
            <aug>
               <au>
                  <snm>Balvay</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Lastra</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sargueil</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Darlix</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Ohlmann</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Nature Reviews Microbiology</source>
            <pubdate>2007</pubdate>
            <volume>5</volume>
            <fpage>128</fpage>
            <lpage>140</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">17224922</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Non-AUG translation initiation of mRNA encoding acidic ribosomal P2A protein in <it>Candida albicans</it></p>
            </title>
            <aug>
               <au>
                  <snm>Abramczyk</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Tch&#243;rzewski</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Grankowski</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Yeast</source>
            <pubdate>2003</pubdate>
            <volume>20</volume>
            <fpage>1045</fpage>
            <lpage>1052</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12961752</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Methionine-Independent Translation Initiation from Naturally Occurring Non-AUG Codons</p>
            </title>
            <aug>
               <au>
                  <snm>Medveczky</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>N&#233;meth</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gr&#225;f</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Szil&#225;gyi</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Current Chemical Biology</source>
            <pubdate>2007</pubdate>
            <volume>1</volume>
            <fpage>129</fpage>
            <lpage>139</lpage>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Use of the 'Perceptron' algorithm to distinguish translational initiation sites in <it>E. coli</it></p>
            </title>
            <aug>
               <au>
                  <snm>Stormo</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Schneider</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Gold</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Ehrenfeucht</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>1982</pubdate>
            <volume>10</volume>
            <fpage>2997</fpage>
            <lpage>3011</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">320670</pubid>
                  <pubid idtype="pmpid" link="fulltext">7048259</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Neural network prediction of translation initiation sites in eukaryotes: perspectives for EST and genome analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Pedersen</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Proceedings of the International Conference on Intelligent Systems in Molecular Biology</source>
            <pubdate>1997</pubdate>
            <volume>5</volume>
            <fpage>226</fpage>
            <lpage>233</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pubmed">9322041</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Translation initiation start prediction in human cDNAs with high accuracy</p>
            </title>
            <aug>
               <au>
                  <snm>Hatzigeorgiou</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <fpage>343</fpage>
            <lpage>350</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11847092</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Assessing protein coding region integrity in cDNA sequencing projects</p>
            </title>
            <aug>
               <au>
                  <snm>Salamov</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Nishikawa</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Swindells</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>1998</pubdate>
            <volume>14</volume>
            <fpage>384</fpage>
            <lpage>390</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9682051</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Translation initiation sites prediction with mixture Gaussian models in human cDNA sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Leong</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>IEEE Transactions on Knowledge and Data Engineering</source>
            <pubdate>2005</pubdate>
            <volume>17</volume>
            <fpage>1152</fpage>
            <lpage>1160</lpage>
         </bibl>
         <bibl id="B14">
            <title>
               <p>An unsupervised classification scheme for improving predictions of prokaryotic TIS</p>
            </title>
            <aug>
               <au>
                  <snm>Tech</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Meinicke</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>121</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1434772</pubid>
                  <pubid idtype="pmpid" link="fulltext">16526950</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Engineering support vector machine kernels that recognize translation initiation sites</p>
            </title>
            <aug>
               <au>
                  <snm>Zien</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>R&#228;tsch</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Mika</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sch&#246;lkopf</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Lengauer</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>M&#252;ller</snm>
                  <fnm>KR</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2000</pubdate>
            <volume>9</volume>
            <fpage>799</fpage>
            <lpage>807</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pubmed">11108702</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Using amino acid patterns to accurately predict translation initiation sites</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Han</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>In silico Biology</source>
            <pubdate>2004</pubdate>
            <volume>4</volume>
            <fpage>255</fpage>
            <lpage>269</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15724279</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>A class of edit kernels for SVMs to predict translation initiation sites in eukaryotic mRNAs</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Jiang</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Journal of Computational Biology</source>
            <pubdate>2005</pubdate>
            <volume>12</volume>
            <fpage>702</fpage>
            <lpage>718</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16108712</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Recognition of translation initiation sites of eukaryotic genes based on an EM algorithm</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Ou</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Guo</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Journal of Computational Biology</source>
            <pubdate>2003</pubdate>
            <volume>10</volume>
            <fpage>699</fpage>
            <lpage>708</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">14633394</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Prediction of translation initiation sites on the genome of <it>Synechocystis </it>sp. strain PCC6803 by hidden Markov model</p>
            </title>
            <aug>
               <au>
                  <snm>Hirosawa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sazuka</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Yada</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>DNA Research</source>
            <pubdate>1997</pubdate>
            <volume>4</volume>
            <fpage>179</fpage>
            <lpage>184</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9330905</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Iseli</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Jongeneel</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Bucher</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Proceedings of the International Conference on Intelligent Systems in Molecular Biology</source>
            <pubdate>1999</pubdate>
            <fpage>138</fpage>
            <lpage>148</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pubmed">10786296</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Modeling sequencing errors by combining Hidden Markov models</p>
            </title>
            <aug>
               <au>
                  <snm>Lottaz</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Iseli</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Jongeneel</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Bucher</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>103</fpage>
            <lpage>112</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pubmed">14534179 </pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Diogenes: reliable ORF-finding in short genomic sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Crow</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Retzel</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <note>2001, unpublished</note>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Comparison of computational methods for identifying translation initiation sites in EST data</p>
            </title>
            <aug>
               <au>
                  <snm>Nadershahi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Fahrenkrug</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ellis</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>14</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">375524</pubid>
                  <pubid idtype="pmpid" link="fulltext">15053846</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>TICO: a tool for postprocessing the predictions of prokaryotic translation initiation sites</p>
            </title>
            <aug>
               <au>
                  <snm>Tech</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Morgenstern</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Meinicke</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>W588</fpage>
            <lpage>W590</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1538874</pubid>
                  <pubid idtype="pmpid" link="fulltext">16845076</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Microbial gene identification using interpolated Markov models</p>
            </title>
            <aug>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Delchur</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kasif</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>1998</pubdate>
            <volume>26</volume>
            <fpage>544</fpage>
            <lpage>548</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">147303</pubid>
                  <pubid idtype="pmpid" link="fulltext">9421513</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Improved microbial gene identification with GLIMMER</p>
            </title>
            <aug>
               <au>
                  <snm>Delcher</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Harmon</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kasif</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>1999</pubdate>
            <volume>27</volume>
            <fpage>4636</fpage>
            <lpage>4641</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">148753</pubid>
                  <pubid idtype="pmpid" link="fulltext">10556321</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Initiation of translation in prokaryotes and eukaryotes</p>
            </title>
            <aug>
               <au>
                  <snm>Kozak</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>1999</pubdate>
            <volume>234</volume>
            <fpage>187</fpage>
            <lpage>208</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10395892</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>gthXML-tools</p>
            </title>
            <url>http://brendelgroup.org/mespar1/gthxml/</url>
         </bibl>
         <bibl id="B29">
            <title>
               <p>MetWAMer</p>
            </title>
            <url>http://brendelgroup.org/SB08B/</url>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Engineering a software tool for gene structure prediction in higher organisms</p>
            </title>
            <aug>
               <au>
                  <snm>Gremme</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Brendel</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Sparks</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kurtz</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Information and Software Technology</source>
            <pubdate>2005</pubdate>
            <volume>47</volume>
            <fpage>965</fpage>
            <lpage>978</lpage>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus</p>
            </title>
            <aug>
               <au>
                  <snm>Brendel</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Xing</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <fpage>1157</fpage>
            <lpage>1169</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">14764557</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Markov model variants for appraisal of coding potential in plant DNA</p>
            </title>
            <aug>
               <au>
                  <snm>Sparks</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Brendel</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Dorman</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Lecture Notes in Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>4463</volume>
            <fpage>394</fpage>
            <lpage>405</lpage>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Translation initiation site prediction on a genomic scale: beauty in simplicity</p>
            </title>
            <aug>
               <au>
                  <snm>Saeys</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Abeel</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Degroeve</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Peer</snm>
                  <mnm>Van de</mnm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <fpage>i418</fpage>
            <lpage>i423</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">17646326</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <aug>
               <au>
                  <snm>Bishop</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Pattern Recognition and Machine Learning</source>
            <publisher>New York, NY: Springer</publisher>
            <pubdate>2006</pubdate>
         </bibl>
         <bibl id="B35">
            <aug>
               <au>
                  <snm>Mitchell</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Machine Learning</source>
            <publisher>Boston, MA: McGraw Hill</publisher>
            <pubdate>1997</pubdate>
         </bibl>
         <bibl id="B36">
            <aug>
               <au>
                  <snm>Russell</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Norvig</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Artificial Intelligence: A Modern Approach</source>
            <publisher>Englewood Cliffs, NJ: Prentice-Hall</publisher>
            <edition>2</edition>
            <pubdate>2003</pubdate>
         </bibl>
         <bibl id="B37">
            <title>
               <p>TAIR: The <it>Arabidopsis </it>Information Resource</p>
            </title>
            <url>http://www.arabidopsis.org/</url>
         </bibl>
         <bibl id="B38">
            <title>
               <p>TIGR XML Specification</p>
            </title>
            <url>ftp://ftp.tigr.org/pub/data/DTDs/tigrxml.dtd</url>
         </bibl>
         <bibl id="B39">
            <title>
               <p>TIGR: The Institute for Genomic Research</p>
            </title>
            <url>http://www.tigr.org/</url>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Basic local alignment search tool</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gish</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Journal of Molecular Biology</source>
            <pubdate>1990</pubdate>
            <volume>215</volume>
            <fpage>403</fpage>
            <lpage>410</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">2231712</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Open source clustering software</p>
            </title>
            <aug>
               <au>
                  <snm>de Hoon</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Imoto</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Nolan</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Miyano</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <fpage>1453</fpage>
            <lpage>1454</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">14871861</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Current methods of gene prediction, their strengths and weaknesses</p>
            </title>
            <aug>
               <au>
                  <snm>Math&#233;</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Sagot</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Schiex</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Rouz&#233;</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>4103</fpage>
            <lpage>4117</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">140543</pubid>
                  <pubid idtype="pmpid" link="fulltext">12364589</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>DNAFSMiner: a web-based software toolbox to recognize two types of functional sites in DNA sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Han</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>671</fpage>
            <lpage>673</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15284102</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Functional annotation of the Arabidopsis genome using controlled vocabularies</p>
            </title>
            <aug>
               <au>
                  <snm>Berardini</snm>
                  <fnm>T</fnm>
               </au>
               <etal/>
            </aug>
            <source>Plant Physiology</source>
            <pubdate>2004</pubdate>
            <volume>135</volume>
            <fpage>745</fpage>
            <lpage>755</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">514112</pubid>
                  <pubid idtype="pmpid" link="fulltext">15173566</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Splice site prediction in <it>Arabidopsis thaliana </it>pre-mRNA by combining local and global sequence information</p>
            </title>
            <aug>
               <au>
                  <snm>Hebsgaard</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Korning</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Tolstrup</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Engelbrecht</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Rouz&#233;</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Brunak</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>1996</pubdate>
            <volume>24</volume>
            <fpage>3439</fpage>
            <lpage>3452</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146109</pubid>
                  <pubid idtype="pmpid" link="fulltext">8811101</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>CCDS project at NCBI</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov/CCDS/</url>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Incorporation of splice site probability models for non-canonical introns improves gene structure prediction in plants</p>
            </title>
            <aug>
               <au>
                  <snm>Sparks</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Brendel</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>iii20</fpage>
            <lpage>iii30</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16306388</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>The Maize Full Length cDNA Project</p>
            </title>
            <url>http://www.maizecdna.org</url>
         </bibl>
         <bibl id="B49">
            <title>
               <p>PlantGDB, plant genome database and analysis tools</p>
            </title>
            <aug>
               <au>
                  <snm>Dong</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Schlueter</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Brendel</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>D354</fpage>
            <lpage>D359</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308780</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681433</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Phytozome</p>
            </title>
            <url>http://www.phytozome.net</url>
         </bibl>
         <bibl id="B51">
            <title>
               <p>Multi-query sequence BLAST output examination with MuSeqBox</p>
            </title>
            <aug>
               <au>
                  <snm>Xing</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Brendel</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>744</fpage>
            <lpage>745</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11524378</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>Using native and syntenically mapped cDNA alignments to improve <it>de novo </it>gene finding</p>
            </title>
            <aug>
               <au>
                  <snm>Stanke</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Diekhans</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Baertsch</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2008</pubdate>
            <volume>24</volume>
            <fpage>637</fpage>
            <lpage>644</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">18218656</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources</p>
            </title>
            <aug>
               <au>
                  <snm>Stanke</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sch&#246;ffmann</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Morgenstern</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Waack</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>62</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1409804</pubid>
                  <pubid idtype="pmpid" link="fulltext">16469098</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>Ensembl 2006</p>
            </title>
            <aug>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>D556</fpage>
            <lpage>D561</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1347495</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381931</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>EuG&#233;ne: an eukaryotic gene finder that combines several sources of evidence</p>
            </title>
            <aug>
               <au>
                  <snm>Schiex</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Moisan</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Rouz&#233;</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Lecture Notes in Computer Science</source>
            <pubdate>2001</pubdate>
            <volume>2066</volume>
            <fpage>111</fpage>
            <lpage>125</lpage>
         </bibl>
         <bibl id="B56">
            <title>
               <p>JIGSAW: integration of multiple sources of evidence for gene prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Allen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>3596</fpage>
            <lpage>3603</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16076884</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B57">
            <title>
               <p>Computational gene prediction using multiple sources of evidence</p>
            </title>
            <aug>
               <au>
                  <snm>Allen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Pertea</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>142</fpage>
            <lpage>148</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">314291</pubid>
                  <pubid idtype="pmpid" link="fulltext">14707176</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B58">
            <title>
               <p>Prediction whether a human cDNA sequence contains initiation codon by combining statistical information and similarity with protein sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Nishikawa</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Ota</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Isogai</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>960</fpage>
            <lpage>967</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11159307</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B59">
            <title>
               <p>Interpreting cDNA sequences: some insights from studies on translation</p>
            </title>
            <aug>
               <au>
                  <snm>Kozak</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Mammalian Genome</source>
            <pubdate>1996</pubdate>
            <volume>7</volume>
            <fpage>563</fpage>
            <lpage>574</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8679005</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B60">
            <title>
               <p><it>cis</it>-acting elements involved in the alternative translation initiation process of human basic fibroblast growth factor mRNA</p>
            </title>
            <aug>
               <au>
                  <snm>Prats</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Vagner</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Prats</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Amalric</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Molecular and Cellular Biology</source>
            <pubdate>1992</pubdate>
            <volume>12</volume>
            <fpage>4796</fpage>
            <lpage>4805</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pubmed">1406661</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B61">
            <title>
               <p>Comparison of the consensus sequence flanking translational start sites in <it>Drosophila </it>and vertebrates</p>
            </title>
            <aug>
               <au>
                  <snm>Cavener</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>1987</pubdate>
            <volume>15</volume>
            <fpage>1353</fpage>
            <lpage>1361</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">340553</pubid>
                  <pubid idtype="pmpid" link="fulltext">3822832</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B62">
            <title>
               <p>ROCR: visualizing classifier performance in R</p>
            </title>
            <aug>
               <au>
                  <snm>Sing</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Sander</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Beerenwinkel</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Lengauer</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>3940</fpage>
            <lpage>3941</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16096348</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B63">
            <title>
               <p>Sequence Logos: a New Way to Display Consensus Sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Schneider</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Stephens</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>1990</pubdate>
            <volume>18</volume>
            <fpage>6097</fpage>
            <lpage>6100</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">332411</pubid>
                  <pubid idtype="pmpid" link="fulltext">2172928</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B64">
            <title>
               <p>WebLogo: A sequence logo generator</p>
            </title>
            <aug>
               <au>
                  <snm>Crooks</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Hon</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Chandonia</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Brenner</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>1188</fpage>
            <lpage>1190</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">419797</pubid>
                  <pubid idtype="pmpid" link="fulltext">15173120</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
