<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2156-8-78</ui>
   <ji>1471-2156</ji>
   <fm>
      <dochead>Software</dochead>
      <bibl>
         <title>
            <p>SERpredict: Detection of tissue- or tumor-specific isoforms generated through exonization of transposable elements</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Mersch</snm>
               <fnm>Britta</fnm>
               <insr iid="I1"/>
               <email>b.mersch@dkfz.de</email>
            </au>
            <au id="A2">
               <snm>Sela</snm>
               <fnm>Noa</fnm>
               <insr iid="I2"/>
               <email>noasela@post.tau.ac.il</email>
            </au>
            <au id="A3">
               <snm>Ast</snm>
               <fnm>Gil</fnm>
               <insr iid="I2"/>
               <email>gilast@post.tau.ac.il</email>
            </au>
            <au id="A4">
               <snm>Suhai</snm>
               <fnm>S&#225;ndor</fnm>
               <insr iid="I1"/>
               <email>s.suhai@dkfz.de</email>
            </au>
            <au id="A5" ca="yes">
               <snm>Hotz-Wagenblatt</snm>
               <fnm>Agnes</fnm>
               <insr iid="I1"/>
               <email>hotz-wagenblatt@dkfz.de</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Molecular Biophysics, German Cancer Research Center (DKFZ), Heidelberg, Germany</p>
            </ins>
            <ins id="I2">
               <p>Department of Human Molecular Genetics, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel</p>
            </ins>
         </insg>
         <source>BMC Genetics</source>
         <issn>1471-2156</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>1</issue>
         <fpage>78</fpage>
         <url>http://www.biomedcentral.com/1471-2156/8/78</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17986331</pubid>
               <pubid idtype="doi">10.1186/1471-2156-8-78</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>24</day>
               <month>5</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>06</day>
               <month>11</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>06</day>
               <month>11</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Mersch et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Transposed elements (TEs) are known to affect transcriptomes, because either new exons are generated from intronic transposed elements (this is called <it>exonization</it>), or the element inserts into the exon, leading to a new transcript. Several examples in the literature show that isoforms generated by an exonization are specific to a certain tissue (for example the heart muscle) or inflict a disease. Thus, exonizations can have negative effects for the transcriptome of an organism.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>As we aimed at detecting other tissue- or tumor-specific isoforms in human and mouse genomes which were generated through exonization of a transposed element, we designed the automated analysis pipeline <it>SERpredict </it>(SER = <ul>S</ul>pecific <ul>E</ul>xonized <ul>R</ul>etroelement) making use of Bayesian Statistics. With this pipeline, we found several genes in which a transposed element formed a tissue- or tumor-specific isoform.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Our results show that <it>SERpredict </it>produces relevant results, demonstrating the importance of transposed elements in shaping both the human and the mouse transcriptomes. The effect of transposed elements on the human transcriptome is several times higher than the effect on the mouse transcriptome, due to the contribution of the primate-specific Alu elements.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Transposed elements (TEs) are sequences of DNA that can move from one position to another in the genome. There are two classes of transposed elements, the DNA transposons and the retroelements. DNA transposons usually move by cut and paste using the transposase enzyme. In contrast, retroelements are genetic elements that integrate in a genome via an RNA intermediate which is reverse-transcribed to DNA. In mammals, almost half the genome is comprised of TEs: around 45% of the human genome is made up of them. This translates to millions of elements, so that on average, every gene in our genome contains about 3 transposed elements. Transposed elements comprise approximately 37% of the mouse genome.</p>
         <p>The human and mouse genome sequences show that TEs have played an important role in shaping the genomes <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. The human genome contains retroelements such as Alu, which is a short interspersed element (SINE), MIR (mammalian interspersed repeat) as well as LINE-1 (L1), LINE-2 (L2) and CR1 (L3). The last three of the given families of retroelements are termed long interspersed elements. In addition, the human genome contains LTR elements such as MaLR (mammalian apparent LTR-retrotransposon), ERVL and ERV1 (endogenous retroviruses) as well as DNA transposons where common families are MER1 and MER2. The mouse genome contains MIR elements as well as rodent-specific SINEs such as B1 (homologous to the left arm of the Alu), B2, B4 and ID as well as LINEs such as L1, L2 and CR1. Similar to the human genome, the mouse genome contains LTRs and DNA transposons. With approximately 1 million copies, Alu is the most frequently encountered TE in the human genome. In mouse, B1 and L1 are the elements with the highest number of copies (B1: 500,000 copies, L1: 800,000 copies).</p>
         <p>Through splicing processes ("exonizations"), small pieces of transposed elements can be inserted into mature mRNAs. These exonizations are caused by motifs that resemble consensus splice sites in both strands of the TEs <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. The transposed elements do not only contain these splice sites but also polyadenylation sites, promoters, enhancers and silencers. Therefore, they can add a variety of functions to their targeted genes <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>.</p>
         <p>Mutations within intronic TEs may yield active splice sites which can be used instead of the normal splice sites, leading to the partial exonization of the intronic TE. However, the other TEs of the human and mouse genomes can be exonized, too. In a previous study, Sela et al. <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> showed that 1824 TEs are exonized in the human genome, of which about 58% are Alus. In the mouse genome, 506 transposed elements are exonized, most of which are either B1 or L1 elements (26% and 20%, respectively). Thus, transposed elements can affect the transcriptome. Either new exons are generated from intronic TEs (see Figure <figr fid="F1">1a (i)</figr>), or the TE inserts into the first or last exon of a gene (Figure <figr fid="F1">1b (i)</figr>), leading to a new transcript <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. In the first case, the exonization can either generate an internal cassette exon (Figure <figr fid="F1">1a (ii)</figr>), an alternative 3'splice site (Figure <figr fid="F1">1a (iii)</figr>), an alternative 5'splice site (Figure <figr fid="F1">1a (iv)</figr>) or a constitutively spliced exon (Figure <figr fid="F1">1a (v)</figr>) <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B9">9</abbr></abbrgrp>. In the case of insertions into first or last exons, the insertions cause either an elongation of the first/last exon (Figure <figr fid="F1">1b (ii, iii)</figr>) or an activation of an alternative intron (Figure <figr fid="F1">1b (iv)</figr>). For the exact number of occurrences of the different events please refer to <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>The effects of TE insertions</p>
            </caption>
            <text>
               <p><b>The effects of TE insertions. a) </b>(i) TE inserts into an intron of a gene. (ii-v) show the possible effects of this integration; (ii) alternatively exon is created, (iii) TE contributes alternative 5'splice site, (iv) TE contributes alternative 3'splice site, (v) TE creates a constitutively spliced exon. <b>b) </b>(i) TE inserts into the first or last exon of a gene. (ii &#8211; iv) show the possible effect of this integration: (ii, iii) enlargement of first or last exon, (iv) TE activates an alternative intron.</p>
            </text>
            <graphic file="1471-2156-8-78-1"/>
         </fig>
         <p>It has been previously reported that more than 5% of the alternatively spliced exons in the human genome are Alu-derived and that all Alu-derived exons are the result of exonization of intronic sequences <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. It can therefore be supposed that genetic diseases can occur when an intronic TE is constitutively or alternatively spliced into the mature mRNA. Searching the literature indeed uncovers evidence that Alu insertions cause genetic disorders <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>.</p>
         <p>Another effect of new exonizations is a potential tissue specificity, in which an exon shows strong tissue regulation <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. An experimental verification of this mechanism is described in a report on Alu de-novo insertion and subsequent exonization within the dystrophin gene that creates a tissue-specific exon inflicting cardiomyopathy <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. Furthermore, since tumorous tissues have been shown to adopt aberrant splicing patterns <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, there might be TE exonizations that are potentially tumor-specific. The survivin gene is one example in which an Alu-generated splice variant is tumor-specific <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>.</p>
         <p>For the detection of new tissue- or tumor-specific TE-containing isoforms in human or mouse genes, we designed and implemented <it>SERpredict</it>, an analysis pipeline making use of Bayesian Statistics.</p>
      </sec>
      <sec>
         <st>
            <p>Implementation</p>
         </st>
         <p><it>SERpredict </it>is based on several databases: the Ensembl database <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, the UCSC genome browser database <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, dbEST <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, and EMBL <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. How they are used and combined is described in the following section.</p>
         <sec>
            <st>
               <p>Library classification</p>
            </st>
            <p>To obtain information about the tissue and the health status of alternative splice forms of genes, the databases dbEST <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> for expressed sequence tags (ESTs) and EMBL <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> for mRNAs are used. These EST and mRNA sequence databases provide information about tissue and tumor sources. For dbEST, library information which include an ID, the library name, the organism, the tissue and a more detailed description is provided with each entry. For the EMBL database, there are features termed "clone lib" and "tissue type". However, this poses a problem since the names of the tissues and tumors are not standardized across the databases. For this reason, we extracted all the EST and mRNA identifiers from the two databases dbEST and EMBL, obtained the associated library information and assigned a tissue category to every given tissue according to <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> (see as well Additional file <supplr sid="S1">1</supplr>). Here, we used key words to identify 52 different cell or tissue source categories, e.g., leukocyte is mapped to the category blood, hippocampus is mapped to brain and so on. Furthermore, either "tumor" or "normal" was added to each library, using again keywords. All the information was then stored in a locally installed MySQL database which is automatically updated if one of the underlying databases is updated. The "annotated" EST and mRNA sequences obtained in this way were used to perform the statistical analysis to determine whether a certain isoforms is tissue- or tumor-specific.</p>
            <suppl id="S1">
               <title>
                  <p>Additional File 1</p>
               </title>
               <text>
                  <p><b>Keywords for tissue categories</b>. Excel file with information about the keywords used to retrieve cell and tissue source information to assign a tissue category (first column) to every given tissue in the databases dbEST and EMBL.</p>
               </text>
               <file name="1471-2156-8-78-S1.xls">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Tissue and tumor specificity</p>
            </st>
            <p>The analysis for tissue or tumor specificity of a certain alternative splice form can be done using the above "annotated" EST and mRNA sequences. Therefore, all EST and mRNA sequences which map the gene of interest have to be extracted. To determine tissue or tumor specificity, the extracted sequences have to be divided into two groups reflecting the two isoforms of the gene. This is easy in our special case of alternative isoforms because one of the isoforms was generated by the exonization of an TE. On the basis of this information, the ESTs and mRNAs are separated into the groups "holding", when the TE is present in the sequence of interest or "skipping", in cases where the TE is not. Using the library classification terms for the sequences, we then get four sets of distributions. For each of those EST and mRNA sequences skipping the TE as well as for those holding the TE, we obtain a tissue and a source (tumor or normal) distribution.</p>
            <p>Determining tissue or tumor specificity from these distributions is not easy, because tissue and tumor source data for EST or mRNA sequences are often incomplete and inconsistent. For a certain gene there are often only a few ESTs sequenced from a particular tissue covering the exons of interest. We therefore have to cope with a poor EST library coverage. Additionally, there are extremely different numbers of ESTs and mRNAs for the different tissues, see Figure <figr fid="F2">2</figr>. This leads to a sampling bias problem. To address these problems, statistical analysis is needed.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Pie chart of EST numbers</p>
               </caption>
               <text>
                  <p><b>Pie chart of EST numbers</b>. Number of ESTs for the different tissues. This figure indicates the extremely different numbers of ESTs and mRNAs for the different tissues.</p>
               </text>
               <graphic file="1471-2156-8-78-2"/>
            </fig>
            <p>Furthermore, when dealing with tissue or tumor specificity there is a problem with including ESTs from cell lines into the analysis. Cell lines are often immortalized, and the immortal lines obtained might not be a perfect representation of the original cells in primary culture. For an estimation how many of the ESTs originated from cell lines, we checked the annotation in the dbEST database and determined that only about 10% of the human and mouse EST sequences were derived from cell lines.</p>
         </sec>
         <sec>
            <st>
               <p>Statistical analysis</p>
            </st>
            <p>To deal with the incomplete and inconsistent data, we used a previously described Bayesian statistics approach to identify tissue-specific exons <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> and to identify exons showing deregulated splicing in tumors <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>.</p>
            <p>To identify tissue-specific exons, a tissue specificity score (TS score) is computed. The confidence that a certain splice variant is preferred in tissue T is calculated as a Bayesian posterior probability:</p>
            <p>
               <display-formula id="M1">
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2156-8-78-i1">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>P</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msub>
                              <m:mi>&#952;</m:mi>
                              <m:mrow>
                                 <m:mn>1</m:mn>
                                 <m:mi>T</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mo>></m:mo>
                           <m:mn>50</m:mn>
                           <m:mi>%</m:mi>
                           <m:mo>|</m:mo>
                           <m:mi>o</m:mi>
                           <m:mi>b</m:mi>
                           <m:mi>s</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:mstyle displaystyle="true">
                                    <m:mrow>
                                       <m:msubsup>
                                          <m:mo>&#8747;</m:mo>
                                          <m:mrow>
                                             <m:mn>0.5</m:mn>
                                          </m:mrow>
                                          <m:mn>1</m:mn>
                                       </m:msubsup>
                                       <m:mrow>
                                          <m:mi>P</m:mi>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>o</m:mi>
                                          <m:mi>b</m:mi>
                                          <m:mi>s</m:mi>
                                          <m:mo>|</m:mo>
                                          <m:msub>
                                             <m:mi>&#952;</m:mi>
                                             <m:mrow>
                                                <m:mn>1</m:mn>
                                                <m:mi>T</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mo stretchy="false">)</m:mo>
                                          <m:mi>P</m:mi>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:msub>
                                             <m:mi>&#952;</m:mi>
                                             <m:mrow>
                                                <m:mn>1</m:mn>
                                                <m:mi>T</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mo stretchy="false">)</m:mo>
                                          <m:mi>d</m:mi>
                                          <m:msub>
                                             <m:mi>&#952;</m:mi>
                                             <m:mrow>
                                                <m:mn>1</m:mn>
                                                <m:mi>T</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                       </m:mrow>
                                    </m:mrow>
                                 </m:mstyle>
                              </m:mrow>
                              <m:mrow>
                                 <m:mstyle displaystyle="true">
                                    <m:mrow>
                                       <m:msubsup>
                                          <m:mo>&#8747;</m:mo>
                                          <m:mn>0</m:mn>
                                          <m:mn>1</m:mn>
                                       </m:msubsup>
                                       <m:mrow>
                                          <m:mi>P</m:mi>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>o</m:mi>
                                          <m:mi>b</m:mi>
                                          <m:mi>s</m:mi>
                                          <m:mo>|</m:mo>
                                          <m:msub>
                                             <m:mi>&#952;</m:mi>
                                             <m:mrow>
                                                <m:mn>1</m:mn>
                                                <m:mi>T</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mo stretchy="false">)</m:mo>
                                          <m:mi>P</m:mi>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:msub>
                                             <m:mi>&#952;</m:mi>
                                             <m:mrow>
                                                <m:mn>1</m:mn>
                                                <m:mi>T</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mo stretchy="false">)</m:mo>
                                          <m:mi>d</m:mi>
                                          <m:msub>
                                             <m:mi>&#952;</m:mi>
                                             <m:mrow>
                                                <m:mn>1</m:mn>
                                                <m:mi>T</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                       </m:mrow>
                                    </m:mrow>
                                 </m:mstyle>
                              </m:mrow>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaqcfaOaemiuaaLaeiikaGccciGae8hUde3aaSbaaeaacqaIXaqmcqWGubavaeqaaiabg6da+iabiwda1iabicdaWiabcwcaLiabcYha8jabd+gaVjabdkgaIjabdohaZjabcMcaPiabg2da9maalaaabaWaa8qmaeaacqWGqbaucqGGOaakcqWGVbWBcqWGIbGycqWGZbWCcqGG8baFcqWF4oqCdaWgaaqaaiabigdaXiabdsfaubqabaGaeiykaKIaemiuaaLaeiikaGIae8hUde3aaSbaaeaacqaIXaqmcqWGubavaeqaaiabcMcaPiabdsgaKjab=H7aXnaaBaaabaGaeGymaeJaemivaqfabeaaaeaacqaIWaamcqGGUaGlcqaI1aqnaeaacqaIXaqmaiabgUIiYdaabaWaa8qmaeaacqWGqbaucqGGOaakcqWGVbWBcqWGIbGycqWGZbWCcqGG8baFcqWF4oqCdaWgaaqaaiabigdaXiabdsfaubqabaGaeiykaKIaemiuaaLaeiikaGIae8hUde3aaSbaaeaacqaIXaqmcqWGubavaeqaaiabcMcaPiabdsgaKjab=H7aXnaaBaaabaGaeGymaeJaemivaqfabeaaaeaacqaIWaamaeaacqaIXaqmaiabgUIiYdaaaaaa@78C2@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>Here, <it>&#952;</it><sub>1<it>T </it></sub>represent the hidden frequency of a splice variant in a specific tissue T and <it>obs </it>stands for the number of ESTs and mRNAs observed in tissue T. <it>P</it>(<it>obs</it>|<it>&#952;</it><sub>1<it>T</it></sub>) is calculated using a binomial distribution and <it>P</it>(<it>&#952;</it><sub>1<it>T</it></sub>) = 1 was used as uninformative prior probability. In the same way, the posterior probability that the same splice variant is preferred in the pool of all other tissues ~ is computed. The TS score for tissue T is then defined as the difference between the posterior probability for tissue T and the posterior probability for the pool of all other tissues:</p>
            <p>
               <display-formula id="M2"><it>TS </it>= 100 [<it>P</it>(<it>&#952;</it><sub>1<it>T </it></sub>> 50%|<it>obs</it>) - <it>P </it>(<it>&#952;</it><sub>1~ </sub>> 50%|<it>obs</it>)]</display-formula>
            </p>
            <p>Here, <it>&#952;</it><sub>1~ </sub>is the frequency of a splice variant in the pool of all other tissues ~. To assess the stability of the TS score robustness values, <it>r</it><sub><it>TS </it></sub>and <it>r</it><sub><it>TS</it>~ </sub>were calculated analogous to the "jack-knife" resampling method. For more details, please refer to <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>.</p>
            <p>To identify tumor-specific exons, a log-odd score (LOD score) is calculated, giving the confidence that the frequency of a splice variant in tumor tissue (<it>&#952;</it><sub><it>T</it></sub>) is higher than the frequency of the splice variant in normal tissue (<it>&#952;</it><sub><it>N</it></sub>):</p>
            <p>
               <display-formula id="M3">
                  <m:math name="1471-2156-8-78-i2" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>L</m:mi>
                           <m:mi>O</m:mi>
                           <m:mi>D</m:mi>
                           <m:mo>=</m:mo>
                           <m:mo>&#8722;</m:mo>
                           <m:msub>
                              <m:mrow>
                                 <m:mi>log</m:mi>
                                 <m:mo>&#8289;</m:mo>
                              </m:mrow>
                              <m:mrow>
                                 <m:mn>10</m:mn>
                              </m:mrow>
                           </m:msub>
                           <m:mfrac>
                              <m:mrow>
                                 <m:mi>P</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:msub>
                                    <m:mi>&#952;</m:mi>
                                    <m:mi>T</m:mi>
                                 </m:msub>
                                 <m:mo>></m:mo>
                                 <m:msub>
                                    <m:mi>&#952;</m:mi>
                                    <m:mi>N</m:mi>
                                 </m:msub>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                              <m:mrow>
                                 <m:mn>1</m:mn>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mi>P</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:msub>
                                    <m:mi>&#952;</m:mi>
                                    <m:mi>T</m:mi>
                                 </m:msub>
                                 <m:mo>></m:mo>
                                 <m:msub>
                                    <m:mi>&#952;</m:mi>
                                    <m:mi>N</m:mi>
                                 </m:msub>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaqcfaOaemitaWKaem4ta8KaemiraqKaeyypa0JaeyOeI0IagiiBaWMaei4Ba8Maei4zaC2aaSbaaeaacqaIXaqmcqaIWaamaeqaamaalaaabaGaemiuaaLaeiikaGccciGae8hUde3aaSbaaeaacqWGubavaeqaaiabg6da+iab=H7aXnaaBaaabaGaemOta4eabeaacqGGPaqkaeaacqaIXaqmcqGHsislcqWGqbaucqGGOaakcqWF4oqCdaWgaaqaaiabdsfaubqabaGaeyOpa4Jae8hUde3aaSbaaeaacqWGobGtaeqaaiabcMcaPaaaaaa@4DCB@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>The LOD score was calculated using direct numerical integration <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>.</p>
            <p>The criteria used for high-confidence of tissue specificity were TS <it>> </it>50, <it>r</it><sub><it>TS </it></sub>> 0.9 and <it>r</it><sub><it>TS </it>~ </sub>> 0.9 as described in <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. A necessary condition for high confidence tissue specificity was at least three EST observations of the mRNA containing the exon in tissue T. As we wanted to have results with high significance only, we changed the criteria for the TS score to TS <it>> </it>85 for our pipeline. For tumor specificity, a log-odd score was calculated. As in <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>, a log-odd score above 2, equivalent to a p-value <it>&lt;</it>0.01, was considered significant.</p>
         </sec>
         <sec>
            <st>
               <p>Work flow in SERpredict</p>
            </st>
            <p>Using the information presented in the sections above, we designed <it>SERpredict </it>to detect tissue- or tumor-specific isoforms, which were generated through the exonization of transposed elements. The work flow is displayed in Figure <figr fid="F3">3</figr>.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Work flow of SERpredict</p>
               </caption>
               <text>
                  <p><b>Work flow of SERpredict</b>. Programs and rules used for extracting tissue- or tumor-specific TE-containing exons, for details see Section "Work flow in SERpredict".</p>
               </text>
               <graphic file="1471-2156-8-78-3"/>
            </fig>
            <p>Initially, the genomic information of the input sequence is determined. Therefore, a Blast search <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> with Ensembl_cdna <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> is performed. Utilizing the Ensembl Application Programming Interface (API), the extracted Ensembl gene identifier is then used to find all transcripts of the gene and thereby all the different exons. If there is no Blast hit matching the criteria (Identity <it>> </it>95% and E-value <it>&lt;</it>10<sup>-3</sup>), an empty output is produced.</p>
            <p>Subsequently, every extracted exon is screened for transposed elements. This is done using the chrN_rmsk table of the UCSC genome browser database <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, which maps the positions of all TEs that have been found by RepeatMasker <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> and the Repbase annotations <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp> to the human and mouse genomes, respectively. This approach is much faster than using RepeatMasker directly.</p>
            <p>Finally, for every such TE-containing exon, an analysis of tissue or tumor specificity is performed as described in the Section "Statistical Analysis". Subsequently, <it>SERpredict </it>extracts all expressed sequence tags (ESTs) and mRNA sequences from the UCSC genome browser database. These are then divided into two groups as described in the Section "Tissue and tumor specificity" and used as input for the statistical analysis.</p>
            <p>As output, <it>SERpredict </it>returns a file with the following information:</p>
            <p>&#8226; Information about the genomic location: Ensembl gene identifier, gene description, chromosome, strand, start and end on genome, number of transcripts and corresponding number of exons</p>
            <p>&#8226; A graphical display of the alternative splice forms of the gene</p>
            <p>&#8226; Information about the repetitive elements: family, ID of exon in which the TE is located, start and end of the TE and the IDs of the transcript containing the TE-exon</p>
            <p>&#8226; If observed: the tissue or tumor specificity of the TE-containing exon</p>
            <p>These results are provided as HTML for visual inspection (see Figure <figr fid="F4">4</figr>) or can be downloaded as XML for easier extraction of relevant results and for storage in private databases.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Example output of SERpredict</p>
               </caption>
               <text>
                  <p><b>Example output of SERpredict</b>. The output of <it>SERpredict </it>for [EMBL:AF466401] is presented. One of the isoforms shows a tumor-specific exon which was generated through the exonization of an Alu element.</p>
               </text>
               <graphic file="1471-2156-8-78-4"/>
            </fig>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results and Discussion</p>
         </st>
         <p>All annotated genes of the human and mouse genomes were screened for TE-containing exons. The number of times the different transposed elements were exonized (and fulfilled the condition of at least three EST observations of the mRNA containing the exon in tissue T) are shown in Table <tblr tid="T1">1</tblr> (for the human genome) and Table <tblr tid="T2">2</tblr> (for the mouse genome).</p>
         <tbl id="T1">
            <title>
               <p>Table 1</p>
            </title>
            <caption>
               <p>Exonizations in the human genome</p>
            </caption>
            <tblbdy cols="8">
               <r>
                  <c ca="left">
                     <p>Human</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c cspan="8">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p>Transposed element</p>
                  </c>
                  <c ca="center">
                     <p>Alu</p>
                  </c>
                  <c ca="center">
                     <p>L1</p>
                  </c>
                  <c ca="center">
                     <p>L2</p>
                  </c>
                  <c ca="center">
                     <p>CR1</p>
                  </c>
                  <c ca="center">
                     <p>MIR</p>
                  </c>
                  <c ca="center">
                     <p>LTR</p>
                  </c>
                  <c ca="center">
                     <p>DNA</p>
                  </c>
               </r>
               <r>
                  <c cspan="8">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p>Number of exonizations</p>
                  </c>
                  <c ca="center">
                     <p>432</p>
                  </c>
                  <c ca="center">
                     <p>88</p>
                  </c>
                  <c ca="center">
                     <p>40</p>
                  </c>
                  <c ca="center">
                     <p>4</p>
                  </c>
                  <c ca="center">
                     <p>86</p>
                  </c>
                  <c ca="center">
                     <p>119</p>
                  </c>
                  <c ca="center">
                     <p>90</p>
                  </c>
               </r>
            </tblbdy>
            <tblfn>
               <p>Number of exonized transposed elements in the human genome which have a least three EST observations of the mRNA containing the TE exon. The Alu element is exonized most frequently among TEs.</p>
            </tblfn>
         </tbl>
         <tbl id="T2">
            <title>
               <p>Table 2</p>
            </title>
            <caption>
               <p>Exonizations in the mouse genome</p>
            </caption>
            <tblbdy cols="10">
               <r>
                  <c ca="left">
                     <p>Mouse</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c cspan="10">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p>transposed element</p>
                  </c>
                  <c ca="center">
                     <p>B1</p>
                  </c>
                  <c ca="center">
                     <p>B2</p>
                  </c>
                  <c ca="center">
                     <p>B4</p>
                  </c>
                  <c ca="center">
                     <p>L1</p>
                  </c>
                  <c ca="center">
                     <p>L2</p>
                  </c>
                  <c ca="center">
                     <p>ID</p>
                  </c>
                  <c ca="center">
                     <p>MIR</p>
                  </c>
                  <c ca="center">
                     <p>LTR</p>
                  </c>
                  <c ca="center">
                     <p>DNA</p>
                  </c>
               </r>
               <r>
                  <c cspan="10">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p>number of exonizations</p>
                  </c>
                  <c ca="center">
                     <p>55</p>
                  </c>
                  <c ca="center">
                     <p>33</p>
                  </c>
                  <c ca="center">
                     <p>22</p>
                  </c>
                  <c ca="center">
                     <p>37</p>
                  </c>
                  <c ca="center">
                     <p>9</p>
                  </c>
                  <c ca="center">
                     <p>4</p>
                  </c>
                  <c ca="center">
                     <p>20</p>
                  </c>
                  <c ca="center">
                     <p>72</p>
                  </c>
                  <c ca="center">
                     <p>8</p>
                  </c>
               </r>
            </tblbdy>
            <tblfn>
               <p>Number of exonized transposed elements in the mouse genome which have a least three EST observations of the mRNA containing the TE exon. The B1 element, which is homologous to the left arm of the Alu element, is exonized most frequently among mouse TEs.</p>
            </tblfn>
         </tbl>
         <p>The 859 human and 260 mouse TE-containing exons were then analyzed for tissue or tumor specificity using <it>SERpredict</it>. In the human exon list, we were able to identify 39 tissue-specifically spliced exons (see Table <tblr tid="T3">3</tblr> for the exons with tissue specificity (TS) score <it>> </it>90). In the mouse exon list, 11 exons showed tissue-specific splicing (see Table <tblr tid="T4">4</tblr> for the exons with tissue specificity (TS) score <it>> </it>90). In the human genome, 18 exons belonged to Alu, 5 were L1 exons, 2 were L2 exons, 1 was an CR1 exon, 5 were MIR exons, 4 were LTR exons and 4 were exons derived from DNA transposons. The highest amount of tissue-specific exonizations arises from the exonization of an Alu element. The fact that the Alu is the most abundant transposed element in the human genome and that it contains potential splice sites makes it a much better-suited sequence for the exonization process than other transposed elements <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> and could be a reason for these results. In mouse, 4 were B1 exons, 2 were B2 exons and 2 were LTR exons. For B4, L2, MIR there was one exon each. The higher amount of specific exonized B1 elements is consistent with the fact that B1 derived from the same ancestral origin as Alu. Still, B1 does not reach the same amount of specific exonizations as Alu because the majority of exonizations of Alu occur in the right arm of the Alu element which is not present in B1. In contrast to the dimeric structure of the Alu element, B1 is a monomer.</p>
         <tbl id="T3">
            <title>
               <p>Table 3</p>
            </title>
            <caption>
               <p>Human tissue-specific TE-exons</p>
            </caption>
            <tblbdy cols="5">
               <r>
                  <c ca="left">
                     <p>Gene</p>
                  </c>
                  <c ca="center">
                     <p>TE</p>
                  </c>
                  <c ca="center">
                     <p>Chr.</p>
                  </c>
                  <c ca="center">
                     <p>Tissue</p>
                  </c>
                  <c ca="center">
                     <p>TS score</p>
                  </c>
               </r>
               <r>
                  <c cspan="5">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Zinc finger protein 195</p>
                  </c>
                  <c ca="center">
                     <p>L1</p>
                  </c>
                  <c ca="center">
                     <p>11</p>
                  </c>
                  <c ca="center">
                     <p>nerve</p>
                  </c>
                  <c ca="center">
                     <p>92.41</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Zinc finger protein 33A</p>
                  </c>
                  <c ca="center">
                     <p>L1</p>
                  </c>
                  <c ca="center">
                     <p>10</p>
                  </c>
                  <c ca="center">
                     <p>trachea</p>
                  </c>
                  <c ca="center">
                     <p>96.81</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Ribonuclease P protein subunit p38</p>
                  </c>
                  <c ca="center">
                     <p>Alu</p>
                  </c>
                  <c ca="center">
                     <p>10</p>
                  </c>
                  <c ca="center">
                     <p>bone</p>
                  </c>
                  <c ca="center">
                     <p>92.30</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>AP-3 complex subunit mu-1</p>
                  </c>
                  <c ca="center">
                     <p>Alu</p>
                  </c>
                  <c ca="center">
                     <p>10</p>
                  </c>
                  <c ca="center">
                     <p>eye</p>
                  </c>
                  <c ca="center">
                     <p>99.21</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Zinc finger protein 195.</p>
                  </c>
                  <c ca="center">
                     <p>Alu</p>
                  </c>
                  <c ca="center">
                     <p>11</p>
                  </c>
                  <c ca="center">
                     <p>nerve</p>
                  </c>
                  <c ca="center">
                     <p>92.41</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>4F2 cell-surface  antigen heavy chain</p>
                  </c>
                  <c ca="center">
                     <p>Alu</p>
                  </c>
                  <c ca="center">
                     <p>11</p>
                  </c>
                  <c ca="center">
                     <p>bone</p>
                  </c>
                  <c ca="center">
                     <p>96.69</p>
                  </c>
               </r>
               <r>
                  <c ca="left"/>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>uterus</p>
                  </c>
                  <c ca="center">
                     <p>96.34</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Suppressor of G2 allele of SKP1 homolog</p>
                  </c>
                  <c ca="center">
                     <p>Alu</p>
                  </c>
                  <c ca="center">
                     <p>13</p>
                  </c>
                  <c ca="center">
                     <p>pancreas</p>
                  </c>
                  <c ca="center">
                     <p>94.39</p>
                  </c>
               </r>
               <r>
                  <c ca="left"/>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>muscle</p>
                  </c>
                  <c ca="center">
                     <p>96.65</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Centrosomal protein of 27 kDa</p>
                  </c>
                  <c ca="center">
                     <p>Alu</p>
                  </c>
                  <c ca="center">
                     <p>15</p>
                  </c>
                  <c ca="center">
                     <p>uterus</p>
                  </c>
                  <c ca="center">
                     <p>93.29</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Fumarylacetoacetate hydrolase domain-containing protein 1</p>
                  </c>
                  <c ca="center">
                     <p>Alu</p>
                  </c>
                  <c ca="center">
                     <p>16</p>
                  </c>
                  <c ca="center">
                     <p>placenta</p>
                  </c>
                  <c ca="center">
                     <p>94.01</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>T-cell activation NFKB-like protein</p>
                  </c>
                  <c ca="center">
                     <p>Alu</p>
                  </c>
                  <c ca="center">
                     <p>19</p>
                  </c>
                  <c ca="center">
                     <p>ovary</p>
                  </c>
                  <c ca="center">
                     <p>93.65</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Zinc finger protein 320</p>
                  </c>
                  <c ca="center">
                     <p>Alu</p>
                  </c>
                  <c ca="center">
                     <p>19</p>
                  </c>
                  <c ca="center">
                     <p>embryo</p>
                  </c>
                  <c ca="center">
                     <p>96.68</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Zinc finger MYM-type protein 1</p>
                  </c>
                  <c ca="center">
                     <p>Alu</p>
                  </c>
                  <c ca="center">
                     <p>1</p>
                  </c>
                  <c ca="center">
                     <p>brain</p>
                  </c>
                  <c ca="center">
                     <p>97.84</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Protein MANBAL</p>
                  </c>
                  <c ca="center">
                     <p>Alu</p>
                  </c>
                  <c ca="center">
                     <p>20</p>
                  </c>
                  <c ca="center">
                     <p>stomach</p>
                  </c>
                  <c ca="center">
                     <p>93.75</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Serine/threonine-protein kinase 6</p>
                  </c>
                  <c ca="center">
                     <p>Alu</p>
                  </c>
                  <c ca="center">
                     <p>20</p>
                  </c>
                  <c ca="center">
                     <p>mouth_oral</p>
                  </c>
                  <c ca="center">
                     <p>94.16</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>CDNA FLJ20699 fis. clone KAIA2372.</p>
                  </c>
                  <c ca="center">
                     <p>Alu</p>
                  </c>
                  <c ca="center">
                     <p>22</p>
                  </c>
                  <c ca="center">
                     <p>muscle</p>
                  </c>
                  <c ca="center">
                     <p>93.75</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>brain</p>
                  </c>
                  <c ca="center">
                     <p>98.69</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>DNA directed RNA polymerase II</p>
                  </c>
                  <c ca="center">
                     <p>Alu</p>
                  </c>
                  <c ca="center">
                     <p>7</p>
                  </c>
                  <c ca="center">
                     <p>colon</p>
                  </c>
                  <c ca="center">
                     <p>93.45</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>ovary</p>
                  </c>
                  <c ca="center">
                     <p>93.45</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>thymus</p>
                  </c>
                  <c ca="center">
                     <p>98.29</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Putative ribosomal RNA methyltransferase 2</p>
                  </c>
                  <c ca="center">
                     <p>Alu</p>
                  </c>
                  <c ca="center">
                     <p>7</p>
                  </c>
                  <c ca="center">
                     <p>breast</p>
                  </c>
                  <c ca="center">
                     <p>93.74</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>GPI ethanolamine phosphate transf. 3</p>
                  </c>
                  <c ca="center">
                     <p>Alu</p>
                  </c>
                  <c ca="center">
                     <p>9</p>
                  </c>
                  <c ca="center">
                     <p>eye</p>
                  </c>
                  <c ca="center">
                     <p>91.82</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Soluble calcium-activated nucl. 1</p>
                  </c>
                  <c ca="center">
                     <p>L2</p>
                  </c>
                  <c ca="center">
                     <p>17</p>
                  </c>
                  <c ca="center">
                     <p>placenta</p>
                  </c>
                  <c ca="center">
                     <p>98.14</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Intraflagellar transport 20 homolog</p>
                  </c>
                  <c ca="center">
                     <p>L2</p>
                  </c>
                  <c ca="center">
                     <p>17</p>
                  </c>
                  <c ca="center">
                     <p>colon</p>
                  </c>
                  <c ca="center">
                     <p>98.92</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>muscle</p>
                  </c>
                  <c ca="center">
                     <p>93.73</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>CDNA FLJ32655 fis</p>
                  </c>
                  <c ca="center">
                     <p>CR1</p>
                  </c>
                  <c ca="center">
                     <p>17</p>
                  </c>
                  <c ca="center">
                     <p>testis</p>
                  </c>
                  <c ca="center">
                     <p>99.99</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Retinol dehydrogenase 13</p>
                  </c>
                  <c ca="center">
                     <p>MIR</p>
                  </c>
                  <c ca="center">
                     <p>19</p>
                  </c>
                  <c ca="center">
                     <p>testis</p>
                  </c>
                  <c ca="center">
                     <p>93.74</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>uterus</p>
                  </c>
                  <c ca="center">
                     <p>92.69</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Tripartite motif-containing protein 14.</p>
                  </c>
                  <c ca="center">
                     <p>MIR</p>
                  </c>
                  <c ca="center">
                     <p>9</p>
                  </c>
                  <c ca="center">
                     <p>thymus</p>
                  </c>
                  <c ca="center">
                     <p>93.39</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Salivary alpha-amylase precursor</p>
                  </c>
                  <c ca="center">
                     <p>ERVK</p>
                  </c>
                  <c ca="center">
                     <p>1</p>
                  </c>
                  <c ca="center">
                     <p>muscle</p>
                  </c>
                  <c ca="center">
                     <p>97.67</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Thiamin pyrophosphokinase 1</p>
                  </c>
                  <c ca="center">
                     <p>ERV1</p>
                  </c>
                  <c ca="center">
                     <p>7</p>
                  </c>
                  <c ca="center">
                     <p>testis</p>
                  </c>
                  <c ca="center">
                     <p>93.75</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Hypothetical protein Q8WZ27</p>
                  </c>
                  <c ca="center">
                     <p>ERV1</p>
                  </c>
                  <c ca="center">
                     <p>4</p>
                  </c>
                  <c ca="center">
                     <p>thyroid</p>
                  </c>
                  <c ca="center">
                     <p>93.73</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Trafficking protein particle complex protein 2</p>
                  </c>
                  <c ca="center">
                     <p>MER1</p>
                  </c>
                  <c ca="center">
                     <p>X</p>
                  </c>
                  <c ca="center">
                     <p>testis</p>
                  </c>
                  <c ca="center">
                     <p>92.55</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>mTERF domain-containing protein 2.</p>
                  </c>
                  <c ca="center">
                     <p>MER2</p>
                  </c>
                  <c ca="center">
                     <p>2</p>
                  </c>
                  <c ca="center">
                     <p>brain</p>
                  </c>
                  <c ca="center">
                     <p>99.09</p>
                  </c>
               </r>
            </tblbdy>
            <tblfn>
               <p>Potentially tissue-specific TE-exons in the human transcriptome. From left to right: the gene name in which the exonization occurred, the transposed elements family name, the chromosome number, the name of the tissue to which the exon is specific and the TS score.</p>
            </tblfn>
         </tbl>
         <tbl id="T4">
            <title>
               <p>Table 4</p>
            </title>
            <caption>
               <p>Mouse tissue-specific TE-exons</p>
            </caption>
            <tblbdy cols="5">
               <r>
                  <c ca="left">
                     <p>Gene</p>
                  </c>
                  <c ca="center">
                     <p>TE</p>
                  </c>
                  <c ca="center">
                     <p>Chr.</p>
                  </c>
                  <c ca="center">
                     <p>Tissue</p>
                  </c>
                  <c ca="center">
                     <p>TS score</p>
                  </c>
               </r>
               <r>
                  <c cspan="5">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>RIKEN cDNA 9830124H08 gene</p>
                  </c>
                  <c ca="center">
                     <p>B1</p>
                  </c>
                  <c ca="center">
                     <p>14</p>
                  </c>
                  <c ca="center">
                     <p>pancreas</p>
                  </c>
                  <c ca="center">
                     <p>99.21</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Gametogenetin binding protein 1</p>
                  </c>
                  <c ca="center">
                     <p>B1</p>
                  </c>
                  <c ca="center">
                     <p>17</p>
                  </c>
                  <c ca="center">
                     <p>pancreas</p>
                  </c>
                  <c ca="center">
                     <p>90.57</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>G protein-coupled receptor 177</p>
                  </c>
                  <c ca="center">
                     <p>B1</p>
                  </c>
                  <c ca="center">
                     <p>3</p>
                  </c>
                  <c ca="center">
                     <p>pancreas</p>
                  </c>
                  <c ca="center">
                     <p>98.05</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Hydroxysteroid dehydrogenase like 2</p>
                  </c>
                  <c ca="center">
                     <p>B1</p>
                  </c>
                  <c ca="center">
                     <p>4</p>
                  </c>
                  <c ca="center">
                     <p>pancreas</p>
                  </c>
                  <c ca="center">
                     <p>98.44</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>NFKB inhibitor interacting Ras-like protein 1</p>
                  </c>
                  <c ca="center">
                     <p>B2</p>
                  </c>
                  <c ca="center">
                     <p>14</p>
                  </c>
                  <c ca="center">
                     <p>pancreas</p>
                  </c>
                  <c ca="center">
                     <p>96.48</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>RIKEN cDNA 4930444A02 gene</p>
                  </c>
                  <c ca="center">
                     <p>B2</p>
                  </c>
                  <c ca="center">
                     <p>8</p>
                  </c>
                  <c ca="center">
                     <p>pancreas</p>
                  </c>
                  <c ca="center">
                     <p>92.81</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>ADP-ribosylation factor-like 4A</p>
                  </c>
                  <c ca="center">
                     <p>B4</p>
                  </c>
                  <c ca="center">
                     <p>12</p>
                  </c>
                  <c ca="center">
                     <p>pancreas</p>
                  </c>
                  <c ca="center">
                     <p>93.75</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Cyclin-dependent kinase 2</p>
                  </c>
                  <c ca="center">
                     <p>L2</p>
                  </c>
                  <c ca="center">
                     <p>10</p>
                  </c>
                  <c ca="center">
                     <p>pancreas</p>
                  </c>
                  <c ca="center">
                     <p>92.75</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Target of EGR1. member 1</p>
                  </c>
                  <c ca="center">
                     <p>L2</p>
                  </c>
                  <c ca="center">
                     <p>4</p>
                  </c>
                  <c ca="center">
                     <p>limb</p>
                  </c>
                  <c ca="center">
                     <p>92.75</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>ST6</p>
                  </c>
                  <c ca="center">
                     <p>MIR</p>
                  </c>
                  <c ca="center">
                     <p>2</p>
                  </c>
                  <c ca="center">
                     <p>limb</p>
                  </c>
                  <c ca="center">
                     <p>92.74</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>intestine</p>
                  </c>
                  <c ca="center">
                     <p>99.4</p>
                  </c>
               </r>
            </tblbdy>
            <tblfn>
               <p>Potentially tissue-specific TE-exons in the mouse transcriptome. From left to right: the gene name in which the exonization occurred, the transposed elements family name, the chromosome number, the name of the tissue to which the exon is specific and the TS score.</p>
            </tblfn>
         </tbl>
         <p>We did not observe a tendency for specificity in any certain tissue in humans. In the mouse genome, interestingly, there is a bias for specific exons in pancreas tissue. This is not due to a bias in the number of ESTs/mRNAs from mouse pancreatic tissue in the database since there are as many pancreatic sequences as sequences from other tissues like intestine or blood. Therefore, this is an interesting result for which we do not have any explanations so far.</p>
         <p>As MIR SINEs were active prior to the mammalian diversification <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> it was unexpected to find 5 tissue-specific MIR exonizations in human and only 1 in the mouse genome. We examined the orthologous loci of the 5 relevant genes RDH13, Elmo2, MRRF, Tri14 and NP_060401.2 in the human and the mouse genome and discovered that there is no MIR element in the mouse genome in 4 of the 5 cases. Only for MRRF there is a MIR in the mouse genome but the exonization in mouse is not tissue-specific. For the specific exon of gene ST6galrsc4 in the mouse genome, there is a MIR at the same position in the human genome but the exon boundaries are different. Therefore, the MIR is not exonized in the human genome. </p>
         <p>To show the efficiency of <it>SERpredict</it>, some of the genes which we predicted to have a tissue-specific TE-derived exon were verified by searching both the literature and the database annotations. Isoform 2 of the T-cell activation NFKB-like protein contains an Alu exon and was predicted as ovary-specific (Table <tblr tid="T3">3</tblr>), which was verified through the human SwissProt <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> entry Q9BRG9. A testis-specific isoform of TPK1 (Thiamine pyrophosphokinase 1) is described in the OMIM <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> entry 606370 <abbrgrp><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr></abbrgrp>. This isoform is 100 bp longer than the broadly expressed variant. This complies with our results of an additional ERV1-derived exon of about 100 bp which makes this isoform testis-specific (Table <tblr tid="T3">3</tblr>). Additionally, the 4F2 cell-surface heavy chain protein seems to be highly expressed in the early stage of new bone formation <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. Although we found an alternative isoform expressed in bone (Table <tblr tid="T3">3</tblr>), the specificity of the TE-derived exon is not described in the literature and could therefore not be verified.</p>
         <p>Our second analysis identified exons which were spliced in a tumor-specific way. We found 21 such exons in human and 2 in mouse genes. In the human genome, 11 were Alu exons, 1 was a L1 exon, 1 was a L2 exon and 4 were MIR exons, 3 were LTR exons and one exon derived from a DNA transposon (see Table <tblr tid="T5">5</tblr>). In mouse, there was 1 L1 exon and 1 MIR exon (see Table <tblr tid="T6">6</tblr>). The data was filtered to search for exons that were intronic within normal tissues and were recognized as exons only within tumorous tissues and, as such, could serve as potential markers for tumor diagnostics. One such exon which contains an Alu element was found in the human gene YY1AP1 (YY1-associated protein 1: hepatocellular carcinoma susceptibility protein). All results for TS <it>> </it>85 and LOD <it>> </it>2 are given in Additional file <supplr sid="S2">2</supplr> for the human and the mouse genome.</p>
         <tbl id="T5">
            <title>
               <p>Table 5</p>
            </title>
            <caption>
               <p>Human tumor-specific TE-exons</p>
            </caption>
            <tblbdy cols="4">
               <r>
                  <c ca="left">
                     <p>Gene</p>
                  </c>
                  <c ca="center">
                     <p>TE</p>
                  </c>
                  <c ca="center">
                     <p>Chr.</p>
                  </c>
                  <c ca="center">
                     <p>LOD</p>
                  </c>
               </r>
               <r>
                  <c cspan="4">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Centrosomal protein of 27 kDa</p>
                  </c>
                  <c ca="center">
                     <p>Alu</p>
                  </c>
                  <c ca="center">
                     <p>15</p>
                  </c>
                  <c ca="center">
                     <p>2.93</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>G-protein coupled receptor 56 precursor</p>
                  </c>
                  <c ca="center">
                     <p>Alu</p>
                  </c>
                  <c ca="center">
                     <p>16</p>
                  </c>
                  <c ca="center">
                     <p>2.8</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>T-cell activation NFKB-like protein</p>
                  </c>
                  <c ca="center">
                     <p>Alu</p>
                  </c>
                  <c ca="center">
                     <p>19</p>
                  </c>
                  <c ca="center">
                     <p>2.76</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Dipeptidyl peptidase 9</p>
                  </c>
                  <c ca="center">
                     <p>Alu</p>
                  </c>
                  <c ca="center">
                     <p>19</p>
                  </c>
                  <c ca="center">
                     <p>3.49</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>NADH dehydrogenase</p>
                  </c>
                  <c ca="center">
                     <p>Alu</p>
                  </c>
                  <c ca="center">
                     <p>19</p>
                  </c>
                  <c ca="center">
                     <p>2.07</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>YY1-associated protein 1</p>
                  </c>
                  <c ca="center">
                     <p>Alu</p>
                  </c>
                  <c ca="center">
                     <p>1</p>
                  </c>
                  <c ca="center">
                     <p>3.15</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Centromere protein R</p>
                  </c>
                  <c ca="center">
                     <p>Alu</p>
                  </c>
                  <c ca="center">
                     <p>1</p>
                  </c>
                  <c ca="center">
                     <p>2.74</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Selenoprotein T precursor.</p>
                  </c>
                  <c ca="center">
                     <p>Alu</p>
                  </c>
                  <c ca="center">
                     <p>3</p>
                  </c>
                  <c ca="center">
                     <p>2.64</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Putative ribosomal RNA methyltransferase 2</p>
                  </c>
                  <c ca="center">
                     <p>Alu</p>
                  </c>
                  <c ca="center">
                     <p>7</p>
                  </c>
                  <c ca="center">
                     <p>2.4</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>GPI ethanolamine phosphate transferase 3</p>
                  </c>
                  <c ca="center">
                     <p>Alu</p>
                  </c>
                  <c ca="center">
                     <p>9</p>
                  </c>
                  <c ca="center">
                     <p>2.07</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Protein RMI1 homolog.</p>
                  </c>
                  <c ca="center">
                     <p>Alu</p>
                  </c>
                  <c ca="center">
                     <p>9</p>
                  </c>
                  <c ca="center">
                     <p>2.03</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>NHP2-like protein 1</p>
                  </c>
                  <c ca="center">
                     <p>L1</p>
                  </c>
                  <c ca="center">
                     <p>22</p>
                  </c>
                  <c ca="center">
                     <p>9.1</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Zinc finger protein DZIP1</p>
                  </c>
                  <c ca="center">
                     <p>L2</p>
                  </c>
                  <c ca="center">
                     <p>13</p>
                  </c>
                  <c ca="center">
                     <p>2.34</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Dynein light chain 2A. cytoplasmic</p>
                  </c>
                  <c ca="center">
                     <p>MIR</p>
                  </c>
                  <c ca="center">
                     <p>20</p>
                  </c>
                  <c ca="center">
                     <p>4.43</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Protein NipSnap1.</p>
                  </c>
                  <c ca="center">
                     <p>MIR</p>
                  </c>
                  <c ca="center">
                     <p>22</p>
                  </c>
                  <c ca="center">
                     <p>6.55</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>ST6</p>
                  </c>
                  <c ca="center">
                     <p>MIR</p>
                  </c>
                  <c ca="center">
                     <p>9</p>
                  </c>
                  <c ca="center">
                     <p>2.69</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Tripartite motif-containing protein 14.</p>
                  </c>
                  <c ca="center">
                     <p>MIR</p>
                  </c>
                  <c ca="center">
                     <p>9</p>
                  </c>
                  <c ca="center">
                     <p>2.86</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>40S ribosomal protein SA</p>
                  </c>
                  <c ca="center">
                     <p>ERV1</p>
                  </c>
                  <c ca="center">
                     <p>3</p>
                  </c>
                  <c ca="center">
                     <p>12.27</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>DNA directed RNA polymerase II</p>
                  </c>
                  <c ca="center">
                     <p>MaLR</p>
                  </c>
                  <c ca="center">
                     <p>7</p>
                  </c>
                  <c ca="center">
                     <p>2.34</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>CDNA FLJ33708 fis, clone BRAWH2007862.</p>
                  </c>
                  <c ca="center">
                     <p>MaLR</p>
                  </c>
                  <c ca="center">
                     <p>6</p>
                  </c>
                  <c ca="center">
                     <p>2.18</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Beta-microseminoprotein precursor</p>
                  </c>
                  <c ca="center">
                     <p>MER1</p>
                  </c>
                  <c ca="center">
                     <p>10</p>
                  </c>
                  <c ca="center">
                     <p>2.92</p>
                  </c>
               </r>
            </tblbdy>
            <tblfn>
               <p>Potentially tumor-specific TE-exons in the human transcriptome. From left to right: the gene name in which the exonization occurred, the transposed elements family name, the chromosome number and the LOD score.</p>
            </tblfn>
         </tbl>
         <tbl id="T6">
            <title>
               <p>Table 6</p>
            </title>
            <caption>
               <p>Mouse tumor-specific TE-exons</p>
            </caption>
            <tblbdy cols="4">
               <r>
                  <c ca="left">
                     <p>Gene</p>
                  </c>
                  <c ca="center">
                     <p>TE</p>
                  </c>
                  <c ca="center">
                     <p>Chr.</p>
                  </c>
                  <c ca="center">
                     <p>LOD</p>
                  </c>
               </r>
               <r>
                  <c cspan="4">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Methylmalonic aciduria</p>
                  </c>
                  <c ca="center">
                     <p>L1</p>
                  </c>
                  <c ca="center">
                     <p>8</p>
                  </c>
                  <c ca="center">
                     <p>2.76</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>ST6</p>
                  </c>
                  <c ca="center">
                     <p>MIR</p>
                  </c>
                  <c ca="center">
                     <p>2</p>
                  </c>
                  <c ca="center">
                     <p>2.56</p>
                  </c>
               </r>
            </tblbdy>
            <tblfn>
               <p>Potentially tumor-specific TE-exons in the mouse transcriptome. From left to right: the gene name in which the exonization occurred, the transposed elements family name, the chromosome number and the LOD score.</p>
            </tblfn>
         </tbl>
         <suppl id="S2">
            <title>
               <p>Additional File 2</p>
            </title>
            <text>
               <p><b>All tissue and tumor specificities</b>. Excel file giving all tissue and tumor-specific TE-containing exons from the human and the mouse genome. There are 4 sheets, one for human tissue-specific TE-containing exons, one for human tumor-specific TE-containing exons, one for mouse tissue-specific TE-containing exons and the last for mouse tumor-specific TE-containing TE-containing exons.</p>
            </text>
            <file name="1471-2156-8-78-S2.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
         <p>We also found an indication for the accuracy of our predictions of genes with tumor-specific exons. From the ST6GALNAC6 gene a 2.4 kB transcript has been described for colon carcinoma, while in normal colon transcripts of 2.5 and 7.5 kB length are found <abbrgrp><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr></abbrgrp>. The colon carcinoma transcript could represent the isoform which omits the first exon and contains the tumor-specific exon.</p>
         <p>Taking these results into account, <it>SERpredict </it>is a useful tool for analyzing TE insertions in genes and to determine their effects for the human and mouse transcriptomes. On the one hand, their insertion into mature mRNAs and the subsequent change in the protein can cause effects in single tissues or even cause major illnesses like cancer. This has already been shown in several examples in the literature <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B15">15</abbr><abbr bid="B17">17</abbr></abbrgrp>. On the other hand, these new exons could be raw material for future evolution of the organisms. The new alternative TE-exons are only included into a fraction of the transcripts of a gene while the rest of the transcripts maintain their original function. Therefore, the addition may be free to evolve with no loss of original function. If the alternative form gains a useful function, its splice sites are strengthened or it can become tissue-specific if the new function has only local benefits <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>Our results show that <it>SERpredict </it>produces relevant results, demonstrating the importance of transposed elements in shaping both the human and the mouse transcriptomes. Due to the contribution of the primate-specific Alu elements, the effect of TEs on the human transcriptome is several times higher than the effect on the mouse transcriptome. We found some evidences for our results in both the literature and the database annotations. Other results still need biological verification. The pipeline can therefore be used as an indicator for biologists interested in tissue- or tumor-specific isoforms to decide which gene might be interesting for further research.</p>
         <p>Due to the incompleteness of the present gene databases, our analysis remains confined to the annotated gene data. With the continuous updating of the mRNA and EST databases, and with it our internal MySQL database, the analysis can be repeated. This will make analyses more precise and will provide results on previously undiscovered exons, using <it>SERpredict </it>to obtain either tissue- or tumor-specific splicing.</p>
         <p>In further studies we will include additional organisms into <it>SERpredict </it>in order to determine differences to the human and mouse genomes. Additionally, we are planning to build a database containing the data of TE-containing exons, the annotation with the TEs, as well as tissue and tumor specificities for different organisms. This will be an extension and an update of the AluGene database <abbrgrp><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr></abbrgrp>.</p>
      </sec>
      <sec>
         <st>
            <p>Availability and requirements</p>
         </st>
         <p>Project name: <it>SERpredict</it></p>
         <p>Project home page: <url>http://genius.embnet.dkfz-heidelberg.de/menu/biounit/open-husar/</url></p>
         <p>Operating system: Platform independent</p>
         <p>Programming language: Perl; Other requirements: Browser</p>
         <p>License: NA</p>
         <p>Any restrictions to use by non-academics: None</p>
         <sec>
            <st>
               <p>Usage</p>
            </st>
            <p>As part of the HUSAR open server, applications are listed on the web page with additional information about the tasks they perform. Query sequences can be uploaded by the usual "copy &amp; paste" procedure into the input box. If more than one sequence is to be queried, a multiple FASTA file can be used. The query starts by clicking on the "submit" button and then the "run" button on the following page. Results can be received by selecting the tab "Go to results page". For further explanation, a flow chart, an example output, and a test sequence are given on the web page.</p>
         </sec>
         <sec>
            <st>
               <p>Input/output formats</p>
            </st>
            <p><it>SERpredict </it>accepts only nucleotide sequences as input. For output, see Section "Work flow in SERpredict".</p>
         </sec>
         <sec>
            <st>
               <p>Performance</p>
            </st>
            <p>Calculations are normally fast, depending on the length of the input sequence and the number of exons the input sequence contains. A calculation takes approximately one minute.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>TE &#8211; transposed element, SINE &#8211; short interspersed element, LINE &#8211; long interspersed element, MIR &#8211; mammalian interspersed repeat, EST &#8211; expressed sequence tag, TS score &#8211; tissue specificity score, LOD score &#8211; log-odd score, HUSAR &#8211; Heidelberg Unix Sequence Analysis Resources</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>BM designed and programmed the pipeline, made the tests and drafted the manuscript. NS participated in designing the pipeline and provided test sequences. GA conceived of the study. SS provided guidance and helped to finish the manuscript. AH supervised the whole project, participated in designing the pipeline and helped to draft the manuscript. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This work was supported in part by the Cooperation Program in Tumor Research of the German Cancer Research Center (DKFZ) and Israeli's Ministry of Science and Technology (MOST) under grant Ca 119.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Initial sequencing and analysis of the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Linton</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Birren</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Nusbaum</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Zody</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Baldwin</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Devon</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Dewar</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Doyle</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>FitzHugh</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2001</pubdate>
            <volume>409</volume>
            <issue>6822</issue>
            <fpage>860</fpage>
            <lpage>921</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35057062</pubid>
                  <pubid idtype="pmpid" link="fulltext">11237011</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Initial sequencing and comparative analysis of the mouse genome</p>
            </title>
            <aug>
               <au>
                  <snm>Waterston</snm>
                  <fnm>RH</fnm>
               </au>
               <au>
                  <snm>Lindblad-Toh</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Rogers</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Abril</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Agarwal</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Agarwala</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ainscough</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Alexandersson</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>An</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>420</volume>
            <issue>6915</issue>
            <fpage>520</fpage>
            <lpage>562</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature01262</pubid>
                  <pubid idtype="pmpid" link="fulltext">12466850</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Alu sequences in the coding regions of mRNA: a source of protein variability</p>
            </title>
            <aug>
               <au>
                  <snm>Makalowski</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Mitchell</snm>
                  <fnm>GA</fnm>
               </au>
               <au>
                  <snm>Labuda</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>1994</pubdate>
            <volume>10</volume>
            <issue>6</issue>
            <fpage>188</fpage>
            <lpage>193</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0168-9525(94)90254-2</pubid>
                  <pubid idtype="pmpid">8073532</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>On "genomenclature": a comprehensive (and respectful) taxonomy for pseudogenes and other "junk DNA"</p>
            </title>
            <aug>
               <au>
                  <snm>Brosius</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gould</snm>
                  <fnm>SJ</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1992</pubdate>
            <volume>89</volume>
            <issue>22</issue>
            <fpage>10706</fpage>
            <lpage>10710</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">50410</pubid>
                  <pubid idtype="pmpid" link="fulltext">1279691</pubid>
                  <pubid idtype="doi">10.1073/pnas.89.22.10706</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>SINEs as a genomic scrap yard: an essay on genomic evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Makalowski</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>The Impact of Short Interspersed Elements (SINEs) on the Host Genome</source>
            <editor>Maraia R, Landes RG</editor>
            <pubdate>1995</pubdate>
            <fpage>81</fpage>
            <lpage>104</lpage>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Transposable elements as a significant source of transcription regulating signals</p>
            </title>
            <aug>
               <au>
                  <snm>Thornburg</snm>
                  <fnm>BG</fnm>
               </au>
               <au>
                  <snm>Gotea</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Makalowski</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>2006</pubdate>
            <volume>365</volume>
            <fpage>104</fpage>
            <lpage>110</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.gene.2005.09.036</pubid>
                  <pubid idtype="pmpid" link="fulltext">16376497</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Comparative Analysis of transposed element insertion within human and mouse genes reveals Alu's unique role in shaping the human transcriptome</p>
            </title>
            <aug>
               <au>
                  <snm>Sela</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Mersch</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Gal-Mark</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Lev-Maor</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Hotz-Wagenblatt</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ast</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Genome Biology</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>R127</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/gb-2007-8-6-r127</pubid>
                  <pubid idtype="pmpid" link="fulltext">17594509</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Alu-containing exons are alternatively spliced</p>
            </title>
            <aug>
               <au>
                  <snm>Sorek</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ast</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Graur</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <issue>7</issue>
            <fpage>1060</fpage>
            <lpage>1067</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">186627</pubid>
                  <pubid idtype="pmpid" link="fulltext">12097342</pubid>
                  <pubid idtype="doi">10.1101/gr.229302</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Functional persistence of exonized mammalian-wide interspersed repeat elements (MIRs)</p>
            </title>
            <aug>
               <au>
                  <snm>Krull</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Petrusma</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Makalowski</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Brosius</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Schmitz</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2007</pubdate>
            <volume>17</volume>
            <issue>8</issue>
            <fpage>1139</fpage>
            <lpage>1145</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.6320607</pubid>
                  <pubid idtype="pmpid" link="fulltext">17623809</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Splice-mediated insertion of an Alu sequence in the COL4A3 mRNA causing autosomal recessive Alport syndrome</p>
            </title>
            <aug>
               <au>
                  <snm>Knebelmann</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Forestier</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Drouot</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Quinones</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Chuet</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Benessy</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Saus</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Antignac</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Hum Mol Genet</source>
            <pubdate>1995</pubdate>
            <volume>4</volume>
            <issue>4</issue>
            <fpage>675</fpage>
            <lpage>679</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/hmg/4.4.675</pubid>
                  <pubid idtype="pmpid" link="fulltext">7633417</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Splice-mediated insertion of an Alu sequence inactivates ornithine delta-aminotransferase: a role for Alu elements in human mutation</p>
            </title>
            <aug>
               <au>
                  <snm>Mitchell</snm>
                  <fnm>GA</fnm>
               </au>
               <au>
                  <snm>Labuda</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Fontaine</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Saudubray</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Bonnefont</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Lyonnet</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Brody</snm>
                  <fnm>LC</fnm>
               </au>
               <au>
                  <snm>Steel</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Obie</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Valle</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1991</pubdate>
            <volume>88</volume>
            <issue>3</issue>
            <fpage>815</fpage>
            <lpage>819</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">50904</pubid>
                  <pubid idtype="pmpid" link="fulltext">1992472</pubid>
                  <pubid idtype="doi">10.1073/pnas.88.3.815</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>A mutation (IVS8+0.6kbdelTC) creating a new donor splice site activates a cryptic exon in an Alu-element in intron 8 of the human beta-glucuronidase gene</p>
            </title>
            <aug>
               <au>
                  <snm>Vervoort</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Gitzelmann</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Lissens</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Liebaers</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Hum Genet</source>
            <pubdate>1998</pubdate>
            <volume>103</volume>
            <issue>6</issue>
            <fpage>686</fpage>
            <lpage>693</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9921904</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>A de novo Alu insertion results in neurofibromatosis type 1</p>
            </title>
            <aug>
               <au>
                  <snm>Wallace</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Andersen</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Saulino</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gregory</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Glover</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Collins</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1991</pubdate>
            <volume>353</volume>
            <fpage>864</fpage>
            <lpage>866</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/353864a0</pubid>
                  <pubid idtype="pmpid" link="fulltext">1719426</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Alternative splicing in the human, mouse and rat genomes is associated with an increased frequency of exon creation and/or loss</p>
            </title>
            <aug>
               <au>
                  <snm>Modrek</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>CJ</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2003</pubdate>
            <volume>34</volume>
            <issue>2</issue>
            <fpage>177</fpage>
            <lpage>180</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1159</pubid>
                  <pubid idtype="pmpid" link="fulltext">12730695</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>A novel Alu-like element rearranged in the dystrophin gene causes a splicing mutation in a family with X-linked dilated cardiomyopathy</p>
            </title>
            <aug>
               <au>
                  <snm>Ferlini</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gali&#233;</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Merlini</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Sewry</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Branzi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Muntoni</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Am J Hum Genet</source>
            <pubdate>1998</pubdate>
            <volume>63</volume>
            <issue>2</issue>
            <fpage>436</fpage>
            <lpage>446</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1377294</pubid>
                  <pubid idtype="pmpid" link="fulltext">9683584</pubid>
                  <pubid idtype="doi">10.1086/301952</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Computational analysis and experimental validation of tumor-associated alternative RNA splicing in human cancer</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Lo</snm>
                  <fnm>HS</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Gere</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Buetow</snm>
                  <fnm>KH</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>MP</fnm>
               </au>
            </aug>
            <source>Cancer Res</source>
            <pubdate>2003</pubdate>
            <volume>63</volume>
            <issue>3</issue>
            <fpage>655</fpage>
            <lpage>657</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12566310</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Exonization of Alu-generated splice variants in the survivin gene of human and non-human primates</p>
            </title>
            <aug>
               <au>
                  <snm>Mola</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Vela</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Fern&#225;ndez-Figueras</snm>
                  <fnm>MT</fnm>
               </au>
               <au>
                  <snm>Isamat</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Mu&#241;oz-M&#225;rmol</snm>
                  <fnm>AM</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2007</pubdate>
            <volume>366</volume>
            <issue>4</issue>
            <fpage>1055</fpage>
            <lpage>1063</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2006.11.089</pubid>
                  <pubid idtype="pmpid" link="fulltext">17204284</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Ensembl 2007</p>
            </title>
            <aug>
               <au>
                  <snm>Hubbard</snm>
                  <fnm>TJP</fnm>
               </au>
               <au>
                  <snm>Aken</snm>
                  <fnm>BL</fnm>
               </au>
               <au>
                  <snm>Beal</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ballester</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Caccamo</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Clarke</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Coates</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Cunningham</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Cutts</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2007</pubdate>
            <issue>35 Database</issue>
            <fpage>D610</fpage>
            <lpage>D617</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1761443</pubid>
                  <pubid idtype="pmpid" link="fulltext">17148474</pubid>
                  <pubid idtype="doi">10.1093/nar/gkl996</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>The UCSC Genome Browser Database: update 2006</p>
            </title>
            <aug>
               <au>
                  <snm>Hinrichs</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Karolchik</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Baertsch</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Barber</snm>
                  <fnm>GP</fnm>
               </au>
               <au>
                  <snm>Bejerano</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Clawson</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Diekhans</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Furey</snm>
                  <fnm>TS</fnm>
               </au>
               <au>
                  <snm>Harte</snm>
                  <fnm>RA</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <issue>34 Database</issue>
            <fpage>D590</fpage>
            <lpage>D598</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1347506</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381938</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj144</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>dbEST-database for "expressed sequence tags"</p>
            </title>
            <aug>
               <au>
                  <snm>Boguski</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Lowe</snm>
                  <fnm>TM</fnm>
               </au>
               <au>
                  <snm>Tolstoshev</snm>
                  <fnm>CM</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>1993</pubdate>
            <volume>4</volume>
            <issue>4</issue>
            <fpage>332</fpage>
            <lpage>333</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng0893-332</pubid>
                  <pubid idtype="pmpid" link="fulltext">8401577</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>EMBL Nucleotide Sequence Database in 2006</p>
            </title>
            <aug>
               <au>
                  <snm>Kulikova</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Akhtar</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Aldebert</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Althorpe</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Andersson</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Baldwin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bates</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Bhattacharyya</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bower</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Browne</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Castro</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2007</pubdate>
            <issue>35 Database</issue>
            <fpage>D16</fpage>
            <lpage>D20</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1897316</pubid>
                  <pubid idtype="pmpid" link="fulltext">17148479</pubid>
                  <pubid idtype="doi">10.1093/nar/gkl913</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Alternative splicing of conserved exons is frequently species-specific in human and mouse</p>
            </title>
            <aug>
               <au>
                  <snm>Pan</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Bakowski</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Morris</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Frey</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Hughes</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Blencowe</snm>
                  <fnm>BJ</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>2</issue>
            <fpage>73</fpage>
            <lpage>77</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2004.12.004</pubid>
                  <pubid idtype="pmpid" link="fulltext">15661351</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Genome-wide detection of tissue-specific alternative splicing in the human transcriptome</p>
            </title>
            <aug>
               <au>
                  <snm>Xu</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Modrek</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <issue>17</issue>
            <fpage>3754</fpage>
            <lpage>3766</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">137414</pubid>
                  <pubid idtype="pmpid" link="fulltext">12202761</pubid>
                  <pubid idtype="doi">10.1093/nar/gkf492</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Discovery of novel splice forms and functional analysis of cancer-specific alternative splicing in human expressed sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Xu</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>5635</fpage>
            <lpage>5643</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">206480</pubid>
                  <pubid idtype="pmpid" link="fulltext">14500827</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg786</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Basic local alignment search tool</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Gish</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1990</pubdate>
            <volume>215</volume>
            <issue>3</issue>
            <fpage>403</fpage>
            <lpage>410</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">2231712</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>RepeatMasker Home Page</p>
            </title>
            <url>http://www.repeatmasker.org</url>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Repbase Update: a database and an electronic journal of repetitive elements</p>
            </title>
            <aug>
               <au>
                  <snm>Jurka</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <issue>9</issue>
            <fpage>418</fpage>
            <lpage>420</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(00)02093-X</pubid>
                  <pubid idtype="pmpid" link="fulltext">10973072</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Repbase Update, a database of eukaryotic repetitive elements</p>
            </title>
            <aug>
               <au>
                  <snm>Jurka</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kapitonov</snm>
                  <fnm>VV</fnm>
               </au>
               <au>
                  <snm>Pavlicek</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Klonowski</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kohany</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Walichiewicz</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Cytogenet Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>110</volume>
            <issue>1&#8211;4</issue>
            <fpage>462</fpage>
            <lpage>467</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1159/000084979</pubid>
                  <pubid idtype="pmpid" link="fulltext">16093699</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>MIRs are classic, tRNA-derived SINEs that amplified before the mammalian radiation</p>
            </title>
            <aug>
               <au>
                  <snm>Smit</snm>
                  <fnm>AF</fnm>
               </au>
               <au>
                  <snm>Riggs</snm>
                  <fnm>AD</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1995</pubdate>
            <volume>23</volume>
            <fpage>98</fpage>
            <lpage>102</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">306635</pubid>
                  <pubid idtype="pmpid" link="fulltext">7870595</pubid>
                  <pubid idtype="doi">10.1093/nar/23.1.98</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003</p>
            </title>
            <aug>
               <au>
                  <snm>Boeckmann</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Blatter</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Estreicher</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gasteiger</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Martin</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Michoud</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>O'Donovan</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Phan</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Pilbout</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Schneider</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>365</fpage>
            <lpage>370</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">165542</pubid>
                  <pubid idtype="pmpid" link="fulltext">12520024</pubid>
    