<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-9-494</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Software</dochead>
      <bibl>
         <title>
            <p>ExonMiner: Web service for analysis of GeneChip Exon array data</p>
         </title>
         <aug>
            <au id="A1" ce="yes">
               <snm>Numata</snm>
               <fnm>Kazuyuki</fnm>
               <insr iid="I1"/>
               <email>numata@ims.u-tokyo.ac.jp</email>
            </au>
            <au id="A2">
               <snm>Yoshida</snm>
               <fnm>Ryo</fnm>
               <insr iid="I2"/>
               <email>yoshidar@ism.ac.jp</email>
            </au>
            <au id="A3">
               <snm>Nagasaki</snm>
               <fnm>Masao</fnm>
               <insr iid="I1"/>
               <email>masao@ims.u-tokyo.ac.jp</email>
            </au>
            <au id="A4">
               <snm>Saito</snm>
               <fnm>Ayumu</fnm>
               <insr iid="I1"/>
               <email>s-ayumu@ims.u-tokyo.ac.jp</email>
            </au>
            <au id="A5">
               <snm>Imoto</snm>
               <fnm>Seiya</fnm>
               <insr iid="I1"/>
               <email>imoto@ims.u-tokyo.ac.jp</email>
            </au>
            <au id="A6" ca="yes">
               <snm>Miyano</snm>
               <fnm>Satoru</fnm>
               <insr iid="I1"/>
               <email>miyano@ims.u-tokyo.ac.jp</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo, 108-8639, Japan</p>
            </ins>
            <ins id="I2">
               <p>Institute of Statistical Mathematics, Research Organization of Information and Systems, 4-6-7 Minami-Azabu, Minato-ku, Tokyo, 106-8569, Japan</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2008</pubdate>
         <volume>9</volume>
         <issue>1</issue>
         <fpage>494</fpage>
         <url>http://www.biomedcentral.com/1471-2105/9/494</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">19036125</pubid>
               <pubid idtype="doi">10.1186/1471-2105-9-494</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>18</day>
               <month>3</month>
               <year>2008</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>26</day>
               <month>11</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>26</day>
               <month>11</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Numata et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Some splicing isoform-specific transcriptional regulations are related to disease. Therefore, detection of disease specific splice variations is the first step for finding disease specific transcriptional regulations. Affymetrix Human Exon 1.0 ST Array can measure exon-level expression profiles that are suitable to find differentially expressed exons in genome-wide scale. However, exon array produces massive datasets that are more than we can handle and analyze on personal computer.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We have developed ExonMiner that is the first all-in-one web service for analysis of exon array data to detect transcripts that have significantly different splicing patterns in two cells, e.g. normal and cancer cells. ExonMiner can perform the following analyses: (1) data normalization, (2) statistical analysis based on two-way ANOVA, (3) finding transcripts with significantly different splice patterns, (4) efficient visualization based on heatmaps and barplots, and (5) meta-analysis to detect exon level biomarkers. We implemented ExonMiner on a supercomputer system in order to perform genome-wide analysis for more than 300,000 transcripts in exon array data, which has the potential to reveal the aberrant splice variations in cancer cells as exon level biomarkers.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>ExonMiner is well suited for analysis of exon array data and does not require any installation of software except for internet browsers. What all users need to do is to access the ExonMiner URL <url>http://ae.hgc.jp/exonminer</url>. Users can analyze full dataset of exon array data within hours by high-level statistical analysis with sound theoretical basis that finds aberrant splice variants as biomarkers.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>It is reported that some splicing isoform-specific transcriptional regulations are related to disease <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. To find disease specific transcriptional regulations, detection of disease specific splice variations is the first step. However, conventional microarrays that produce gene-level information are not suitable for this purpose. On the other hand, Affymetrix Human Exon 1.0 ST Array can measure exon-level expression profiles that are suitable to find differentially expressed exons in genome-wide scale. Affymetrix exon array can measure the transcript levels of more than 1,000,000 exons with 300,000 transcripts by about 6,500,000 probes.</p>
         <p>We have developed a supercomputer-based web service named ExonMiner to analyze exon array datasets for detecting genes that are spliced into different isoforms in two types of cells in comparison, e.g. normal and cancer cells. There are some noncommercial standalone applications for analyzing exon array data: IGB <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> is an application for visualizing exon array data and ExACT <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> and Affymetrix Expression Console <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> are mainly focusing on normalizing exon array data. Also, Bioconductor <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> (exonmap <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>) focuses on annotation as well as normalization. The advantage of exonmap is that users can use other statistical tools implemented on R. These are well organized applications, however, these applications focus on data normalizations and we need to use other software for further analysis. Since ExonMiner is, however, an all-in-one web service on a supercomputer system, users can analyze more than 300,000 transcripts spotted on exon array by data normalization, two-way ANOVA analysis, visualization of the results, and detection of exon-level biomarkers. Based on our experiments, which used colon cancer exon array data that contains 20 exon arrays, on various situations of our system usages, the minimal computational time is four hours and the longest was finished in one day. We also observed that the average computational time of colon cancer example is about eight hours.</p>
         <p>We have implemented ExonMiner on our Super Computer System <url>https://supcom.hgc.jp/english/</url> in Human Genome Center, Institute of Medical Science, University of Tokyo and created GUI to use the all analysis tools of ExonMiner easily. An illustrative example of colon cancer exon array data analysis <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> is shown in the web site. ExonMiner has five advantages: (1) a statistical analysis framework, (2) analysis for all transcripts completed, (3) effective visualization with heatmap and barplot images, (4) sophisticated and easy-to-use web interface, and (5) useful hyperlinks to major public databases, e.g. PubMed and NetAffx.</p>
         <p>As shown in latter sections, the method implemented in ExonMiner requires more computational time than other software, due to the nonparametric test based on bootstrapping. For example, we need to repeat bootstrap sampling more than 1,000 times for computing accurate <it>p</it>-values of statistical tests finding aberrant splice variations, it requires 1,000 times computation of usual statistical test of ANOVA with Gaussian error model. Therefore, we need high-performance parallel computing on Super Computer System. Also, more advanced methods implemented on ExonMiner in future possibly requires more computational resources, therefore, the use of Super Computer System can give flexible computational basis and is suitable for our purpose.</p>
         <sec>
            <st>
               <p>Data normalization</p>
            </st>
            <p>Before performing statistical analysis, we apply normalization method to raw exon array data. ExonMiner can remove a bias related to GC-content in each probe. The probes are categorized according to their GC-contents and GC-content specific bias will be removed from the probes in each category. ExonMiner uses two types of control for data normalization: One is the median value for each GC category and the other is based on antigenomic background probes. The antigenomic background probes are also categorized into GC categories and we compute their median values. The median value of the probe intensities in each GC category will be transformed by subtracting corresponding control value. In case that user chooses the median values of GC categories for control, the median of probe intensities in a GC category will be equal to one.</p>
         </sec>
         <sec>
            <st>
               <p>Two-way Analysis of Variance</p>
            </st>
            <sec>
               <st>
                  <p>Concept and Model</p>
               </st>
               <p>For using ExonMiner to detect aberrant splice variations, user needs to prepare at least two exon array data from a pair of cells. For example, in our illustrative example, one exon array is prepared for measuring exon profiles in colon cancer cell and the other exon array is used for normal cell. In this case, we can find aberrant splice variants in colon cancer by comparing with normal cells. In this purpose, we use two-way analysis of variance (ANOVA). Suppose that a gene (transcript cluster) is composed of the <it>m </it>exonic regions (exon clusters), and that <it>x</it><sub><it>ijk </it></sub>is the background corrected probe intensity for the <it>k </it>th probe (<it>k </it>= 1, &#8943;, <it>n</it><sub><it>ij</it></sub>) on the <it>i </it>th exon (<it>i </it>= 1, &#8943;, <it>m</it>) of a transcript, i.e. this transcript has <it>m </it>exonic regions and each exonic region is spanned by <it>n</it><sub><it>ij </it></sub>probes. Here the index <it>j </it>denotes the type of cells, e.g. <it>j </it>= 1 denotes normal cell and <it>j </it>= 2 for cancer cell. If we observe <it>x</it><sub><it>ijk </it></sub>&#8776; <it>c </it>for any <it>i</it>, <it>j </it>and <it>k</it>, the transcript does not show any transcriptional changes and splicing variations across cell types (<it>j </it>= 1, 2). If we observe that <it>x</it><sub><it>i</it>1<it>k </it></sub>&#8776; <it>c</it><sub>1 </sub>and <it>x</it><sub><it>i</it>2<it>k </it></sub>&#8776; <it>c</it><sub>2 </sub>(<it>c</it><sub>1 </sub>&#8800; <it>c</it><sub>2</sub>) for any <it>i </it>and <it>k</it>, it indicates that this transcript was differentially expressed between two cells and this information is equivalent to usual microarray expression data like cDNA microarray, GeneChip and so on. On the other hand, <it>x</it><sub><it>ijk </it></sub>&#8776; <it>c</it><sub>1 </sub>and <it>x</it><sub><it>i</it>'<it>jk </it></sub>&#8776; <it>c</it><sub>2 </sub>for any <it>j</it>, <it>k </it>and <it>i </it>&#8800; <it>i' </it>hold where <it>c</it><sub>1 </sub>&#8800; <it>c</it><sub>2 </sub>and <it>c</it><sub>1</sub>, <it>c</it><sub>2 </sub>> 0, it means that this transcript has splice variations but these splice variations are commonly occurred between cell types. Finally, if we observe that two cells show different splice patterns, we define them aberrant splice variations. We will capture this information by two-way ANOVA model. For ANOVA in exon array data analysis, see also <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>.</p>
               <p>For detecting transcripts that show aberrant splice variations, we use two-way ANOVA model defined by</p>
               <p>
                  <display-formula><it>x</it><sub><it>ijk </it></sub>= <it>&#956; </it>+ <it>&#945;</it><sub><it>i </it></sub>+ <it>&#946;</it><sub><it>j </it></sub>+ <it>&#947;</it><sub><it>ij </it></sub>+ <it>&#948;</it><sub><it>ijk</it></sub>,</display-formula>
               </p>
               <p>where <it>&#945;</it><sub><it>i</it></sub>, <it>&#946;</it><sub><it>j </it></sub>and <it>&#947;</it><sub><it>ij </it></sub>are parameters, <it>&#949;</it><sub><it>ijk </it></sub>denotes the observational noise having zero mean and variance <it>&#963;</it><sup>2</sup>, and <it>&#956; </it>represents an overall mean of the probe intensities. The parameter <it>&#945;</it><sub><it>i </it></sub>represents the baseline intensities in the <it>i </it>th exonic region (<it>i </it>= 1, &#8943;, <it>m</it>), this parameter captures exon effect. The parameters <it>&#946;</it><sub><it>j </it></sub>(<it>j </it>= 1, 2) capture difference in the overall means between two cells, this difference is called overall gene effect. The <it>&#947;</it><sub><it>ij </it></sub>s represent interaction effects for each combination of <it>m </it>exons and cell types, which is called effect of specific splice variations. The effects of these parameters are shown in Figure <figr fid="F1">1</figr>. A given statistical evidence that one or more <it>&#947;</it><sub><it>ij </it></sub>s are different with the others suggests that alternative splicing is present in a particular cell, but absent in the other. We should note that MIDAS <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> is a similar method that uses ANOVA model to analyze exon array data, but MIDAS uses exon-level summarized data, while our model uses probe-level data. Also nonparametric test based on bootstrap method can be considered our advantage.</p>
               <fig id="F1">
                  <title>
                     <p>Figure 1</p>
                  </title>
                  <caption>
                     <p>The effects of two-way ANOVA parameters</p>
                  </caption>
                  <text>
                     <p>
                        <b>The effects of two-way ANOVA parameters.  </b>
                     </p>
                  </text>
                  <graphic file="1471-2105-9-494-1"/>
               </fig>
            </sec>
            <sec>
               <st>
                  <p>Statistical tests for detecting alternative splicing, differentially expression, and aberrant splice variations</p>
               </st>
               <p>The estimates of <it>&#947;</it><sub><it>ij </it></sub>s could capture presence of aberrant splice variations. By the ANOVA model, the probe fluctuations are decomposed into three orthogonal effects, i.e. exon effect (<it>&#945;</it><sub><it>i</it></sub>), overall gene effect (<it>&#946;</it><sub><it>j</it></sub>) and effect of specific splice variations (<it>&#947;</it><sub><it>ij</it></sub>). The statistical significance of each effect can be evaluated by the following three tests:</p>
               <p><b>Test 1 </b>(Detection for exon effect):</p>
               <p>H<sub>0</sub>: <it>&#945;</it><sub><it>i </it></sub>= 0 for all <it>i</it>.</p>
               <p>H<sub><it>a</it></sub>: <it>&#945;</it><sub><it>i </it></sub>&#8800; 0 at least one <it>i</it>.</p>
               <p><b>Test 2 </b>(Detection for overall gene effect):</p>
               <p>H<sub>0</sub>: <it>&#946;</it><sub>1 </sub>= <it>&#946;</it><sub>2</sub></p>
               <p>H<sub>a</sub>: <it>&#946;</it><sub>1 </sub>&#8800; <it>&#946;</it><sub>2</sub></p>
               <p><b>Test 3 </b>(Detection for effect of specific splice variations):</p>
               <p>H<sub>0</sub>: <it>&#947;</it><sub><it>ij </it></sub>= 0 for all <it>i </it>and <it>j</it>.</p>
               <p>H<sub>a</sub>: <it>&#947;</it><sub><it>ij </it></sub>&#8800; 0 for one or more pairs of (<it>i</it>, <it>j</it>).</p>
               <p>Here H<sub>0 </sub>and H<sub>a </sub>represent null and alternative hypotheses, respectively. Repeating these hypothesis tests for all transcript clusters, one can obtain the statistical evidences of aberrant splice variations which are scored by the computed <it>p</it>-values from Test 3. In ExonMiner, in addition to the usual F-test for test of parameter significance, the permutation method that is a nonparametric approach is developed to calculate the null distribution of the F-statistics; <it>F</it><sub>exon</sub>, <it>F</it><sub>gene </sub>and <it>F</it><sub>sas</sub>, for assessing significance of exon effect, gene effect and effect of specific splice variations, respectively. In order to evaluate the null distributions, we first generate a permutation set of samples by bootstrapping <it>n </it>= &#931;<sub><it>ij</it></sub><it>n</it><sub><it>ij </it></sub>samples from <it>x</it><sub><it>ijk </it></sub>s. Repeating this process <it>B </it>times, we can approximately evaluate the null distribution of each <it>F</it><sub>* </sub>with the <it>Q </it>permutation statistics <inline-formula><m:math name="1471-2105-9-494-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>f</m:mi><m:mrow><m:msup><m:mn>0</m:mn><m:mo>&#8727;</m:mo></m:msup></m:mrow><m:mrow><m:mo stretchy="false">(</m:mo><m:mi>q</m:mi><m:mo stretchy="false">)</m:mo></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemOzay2aa0baaSqaaiabicdaWmaaCaaameqabaGaey4fIOcaaaWcbaGaeiikaGIaemyCaeNaeiykaKcaaaaa@328A@</m:annotation></m:semantics></m:math></inline-formula>, <it>q </it>= 1, &#8943;, <it>Q </it>. Note that * can be replaced by exon, gene and sas. Subsequently, the <it>p</it>-value for a given test statistic <it>F</it><sub>* </sub>= <it>f</it><sub>* </sub>obtained from the original data set is calculated by</p>
               <p>
                  <display-formula>
                     <m:math name="1471-2105-9-494-i2" xmlns:m="http://www.w3.org/1998/Math/MathML">
                        <m:semantics>
                           <m:mrow>
                              <m:msub>
                                 <m:mi>p</m:mi>
                                 <m:mo>&#8727;</m:mo>
                              </m:msub>
                              <m:mo>=</m:mo>
                              <m:mfrac>
                                 <m:mrow>
                                    <m:mo>#</m:mo>
                                    <m:mo>{</m:mo>
                                    <m:mi>q</m:mi>
                                    <m:mo>:</m:mo>
                                    <m:msubsup>
                                       <m:mi>f</m:mi>
                                       <m:mrow>
                                          <m:msup>
                                             <m:mn>0</m:mn>
                                             <m:mo>&#8727;</m:mo>
                                          </m:msup>
                                       </m:mrow>
                                       <m:mrow>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>q</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                    </m:msubsup>
                                    <m:mo>&#8805;</m:mo>
                                    <m:msub>
                                       <m:mi>f</m:mi>
                                       <m:mo>&#8727;</m:mo>
                                    </m:msub>
                                    <m:mo>}</m:mo>
                                 </m:mrow>
                                 <m:mi>Q</m:mi>
                              </m:mfrac>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiCaa3aaSbaaSqaaiabgEHiQaqabaGccqGH9aqpjuaGdaWcaaqaaiabcocaJiabcUha7jabdghaXjabcQda6iabdAgaMnaaDaaabaGaeGimaaZaaWbaaeqabaGaey4fIOcaaaqaaiabcIcaOiabdghaXjabcMcaPaaacqGHLjYScqWGMbGzdaWgaaqaaiabgEHiQaqabaGaeiyFa0habaGaemyuaefaaaaa@4273@</m:annotation>
                        </m:semantics>
                     </m:math>
                  </display-formula>
               </p>
               <p>for each effect. In ExonMiner, users can choose parametric or nonparametric test for assessing significance of each parameter.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Meta Analysis</p>
            </st>
            <p>To detect aberrant splice variants as biomarkers, we need to check whether the detected aberrant splice variants are common in the targeted disease or not. In this purpose, we establish a statistical testing procedure based on meta-analysis <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. Suppose that we have <it>G </it>pair of exon array datasets, i.e. normal and tumor exon expression data are measured from <it>G </it>patients. By performing the whole transcript analysis based on two-way ANOVA to <it>G </it>paired exon array datasets, one obtains a set of <it>p</it>-values for each effect, e.g. effect of specific splice variations (<it>&#947;</it><sub><it>ij</it></sub>), <inline-formula><m:math name="1471-2105-9-494-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>p</m:mi><m:mn>1</m:mn><m:mi>g</m:mi></m:msubsup><m:mo>,</m:mo><m:mo>&#8943;</m:mo><m:mo>,</m:mo><m:msubsup><m:mi>p</m:mi><m:mi>r</m:mi><m:mi>g</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiCaa3aa0baaSqaaiabigdaXaqaaiabdEgaNbaakiabcYcaSiabl+UimjabcYcaSiabdchaWnaaDaaaleaacqWGYbGCaeaacqWGNbWzaaaaaa@37C4@</m:annotation></m:semantics></m:math></inline-formula>, across patients, <it>g </it>= 1, &#8943;, <it>G</it>. Here the total number of transcripts analyzed is denoted by <it>r</it>. Intuitively, a transcript having a small <it>p</it>-value is strongly associated with tumor formation. However, it is possible that some observed aberrant splice variants could be caused by the inter-individual differences of the analyzed samples. Our goal is to discover the "universal biomarkers", i.e. aberrant splice variations which are shared by most individuals with a particular diagnostic category.</p>
            <p>Following this direction, we develop the statistical technique within the framework of meta-analysis based on the normal inversion method.</p>
            <p>Let <inline-formula><m:math name="1471-2105-9-494-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>x</m:mi><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi><m:mi>k</m:mi></m:mrow><m:mi>g</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiEaG3aa0baaSqaaiabdMgaPjabdQgaQjabdUgaRbqaaiabdEgaNbaaaaa@32E9@</m:annotation></m:semantics></m:math></inline-formula> denote the observed probe intensities of the <it>k </it>th probe which spans the <it>i </it>th exonic region for normal cell (<it>j </it>= 1) or target cell (<it>j </it>= 2) isolated from the <it>g </it>th individual. We assume that the probe intensities <inline-formula><m:math name="1471-2105-9-494-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>x</m:mi><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi><m:mi>k</m:mi></m:mrow><m:mi>g</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiEaG3aa0baaSqaaiabdMgaPjabdQgaQjabdUgaRbqaaiabdEgaNbaaaaa@32E9@</m:annotation></m:semantics></m:math></inline-formula> of each individual can be modeled by the two-way ANOVA defined by</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-9-494-i5" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msubsup>
                              <m:mi>x</m:mi>
                              <m:mrow>
                                 <m:mi>i</m:mi>
                                 <m:mi>j</m:mi>
                                 <m:mi>k</m:mi>
                              </m:mrow>
                              <m:mi>g</m:mi>
                           </m:msubsup>
                           <m:mo>=</m:mo>
                           <m:msup>
                              <m:mi>&#956;</m:mi>
                              <m:mi>g</m:mi>
                           </m:msup>
                           <m:mo>+</m:mo>
                           <m:msubsup>
                              <m:mi>&#945;</m:mi>
                              <m:mi>i</m:mi>
                              <m:mi>g</m:mi>
                           </m:msubsup>
                           <m:mo>+</m:mo>
                           <m:msubsup>
                              <m:mi>&#946;</m:mi>
                              <m:mi>j</m:mi>
                              <m:mi>g</m:mi>
                           </m:msubsup>
                           <m:mo>+</m:mo>
                           <m:msubsup>
                              <m:mi>&#947;</m:mi>
                              <m:mrow>
                                 <m:mi>i</m:mi>
                                 <m:mi>j</m:mi>
                              </m:mrow>
                              <m:mi>g</m:mi>
                           </m:msubsup>
                           <m:mo>+</m:mo>
                           <m:msubsup>
                              <m:mi>&#949;</m:mi>
                              <m:mrow>
                                 <m:mi>i</m:mi>
                                 <m:mi>j</m:mi>
                                 <m:mi>k</m:mi>
                              </m:mrow>
                              <m:mi>g</m:mi>
                           </m:msubsup>
                           <m:mo>,</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiEaG3aa0baaSqaaiabdMgaPjabdQgaQjabdUgaRbqaaiabdEgaNbaakiabg2da9iabeY7aTnaaCaaaleqabaGaem4zaCgaaOGaey4kaSIaeqySde2aa0baaSqaaiabdMgaPbqaaiabdEgaNbaakiabgUcaRiabek7aInaaDaaaleaacqWGQbGAaeaacqWGNbWzaaGccqGHRaWkcqaHZoWzdaqhaaWcbaGaemyAaKMaemOAaOgabaGaem4zaCgaaOGaey4kaSIaeqyTdu2aa0baaSqaaiabdMgaPjabdQgaQjabdUgaRbqaaiabdEgaNbaakiabcYcaSaaa@5240@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>for <it>g </it>= 1, &#8943;, <it>G</it>. Given these models, the statistical hypothesis testing of each effect, for example, effect of specific splice variations, is formulated by</p>
            <p><b>Test 4 </b>(Detection for universal specific splice variations):</p>
            <p>H<sub>0</sub>: <inline-formula><m:math name="1471-2105-9-494-i6" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>&#947;</m:mi><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi></m:mrow><m:mi>g</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeq4SdC2aa0baaSqaaiabdMgaPjabdQgaQbqaaiabdEgaNbaaaaa@31B8@</m:annotation></m:semantics></m:math></inline-formula> = 0 for all <it>i</it>, <it>j </it>and <it>k</it>.</p>
            <p>H<sub>a</sub>: <inline-formula><m:math name="1471-2105-9-494-i6" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>&#947;</m:mi><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi></m:mrow><m:mi>g</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeq4SdC2aa0baaSqaaiabdMgaPjabdQgaQbqaaiabdEgaNbaaaaa@31B8@</m:annotation></m:semantics></m:math></inline-formula> &#8800; 0 for one or more tuple (<it>i</it>, <it>j</it>, <it>g</it>).</p>
            <p>In order to assess the H<sub>0</sub>, we propose use of the normal inversion metric as a test statistic. Suppose that we have a set of <it>p</it>-values, <inline-formula><m:math name="1471-2105-9-494-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>p</m:mi><m:mi>h</m:mi><m:mn>1</m:mn></m:msubsup><m:mo>,</m:mo><m:mo>&#8943;</m:mo><m:mo>,</m:mo><m:msubsup><m:mi>p</m:mi><m:mi>h</m:mi><m:mi>G</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiCaa3aa0baaSqaaiabdIgaObqaaiabigdaXaaakiabcYcaSiabl+UimjabcYcaSiabdchaWnaaDaaaleaacqWGObaAaeaacqWGhbWraaaaaa@3772@</m:annotation></m:semantics></m:math></inline-formula>, for occurrence of the aberrant splice variations in the <it>h </it>th transcript cluster. The method first converts these <it>p</it>-values into the <it>z</it>-scores as <inline-formula><m:math name="1471-2105-9-494-i8" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>z</m:mi><m:mi>h</m:mi><m:mi>g</m:mi></m:msubsup><m:mo>=</m:mo><m:msup><m:mi mathvariant="normal">&#934;</m:mi><m:mrow><m:mo>&#8722;</m:mo><m:mn>1</m:mn></m:mrow></m:msup><m:mo stretchy="false">(</m:mo><m:mn>1</m:mn><m:mo>&#8722;</m:mo><m:msubsup><m:mi>p</m:mi><m:mi>h</m:mi><m:mi>g</m:mi></m:msubsup><m:mo stretchy="false">)</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemOEaO3aa0baaSqaaiabdIgaObqaaiabdEgaNbaakiabg2da9iabfA6agnaaCaaaleqabaGaeyOeI0IaeGymaedaaOGaeiikaGIaeGymaeJaeyOeI0IaemiCaa3aa0baaSqaaiabdIgaObqaaiabdEgaNbaakiabcMcaPaaa@3CAC@</m:annotation></m:semantics></m:math></inline-formula>, where &#934;<sup>-1 </sup>is the inversion of the standard normal cumulative distribution function, and then computed integrated <it>z</it>-score as</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-9-494-i9" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>z</m:mi>
                              <m:mi>h</m:mi>
                           </m:msub>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:mstyle displaystyle="true">
                                    <m:msubsup>
                                       <m:mo>&#8721;</m:mo>
                                       <m:mrow>
                                          <m:mi>g</m:mi>
                                          <m:mo>=</m:mo>
                                          <m:mn>1</m:mn>
                                       </m:mrow>
                                       <m:mi>G</m:mi>
                                    </m:msubsup>
                                    <m:mrow>
                                       <m:msubsup>
                                          <m:mi>z</m:mi>
                                          <m:mi>h</m:mi>
                                          <m:mi>g</m:mi>
                                       </m:msubsup>
                                    </m:mrow>
                                 </m:mstyle>
                              </m:mrow>
                              <m:mrow>
                                 <m:msqrt>
                                    <m:mi>G</m:mi>
                                 </m:msqrt>
                              </m:mrow>
                           </m:mfrac>
                           <m:mo>.</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemOEaO3aaSbaaSqaaiabdIgaObqabaGccqGH9aqpjuaGdaWcaaqaamaaqadabaGaemOEaO3aa0baaeaacqWGObaAaeaacqWGNbWzaaaabaGaem4zaCMaeyypa0JaeGymaedabaGaem4raCeacqGHris5aaqaamaakaaabaGaem4raCeabeaaaaGaeiOla4caaa@3D7B@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>The significance of H<sub>a </sub>can be assessed based on the integrated <it>p</it>-value which is computed by transforming the <it>z</it>-score with the standard normal cumulative distribution function &#934; as</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-9-494-i10" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msubsup>
                              <m:mi>p</m:mi>
                              <m:mi>h</m:mi>
                              <m:mrow>
                                 <m:mtext>integrated</m:mtext>
                              </m:mrow>
                           </m:msubsup>
                           <m:mo>=</m:mo>
                           <m:mn>1</m:mn>
                           <m:mo>&#8722;</m:mo>
                           <m:mi>&#934;</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msub>
                              <m:mi>z</m:mi>
                              <m:mi>h</m:mi>
                           </m:msub>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>.</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiCaa3aa0baaSqaaiabdIgaObqaaiabbMgaPjabb6gaUjabbsha0jabbwgaLjabbEgaNjabbkhaYjabbggaHjabbsha0jabbwgaLjabbsgaKbaakiabg2da9iabigdaXiabgkHiTiabfA6agjabcIcaOiabdQha6naaBaaaleaacqWGObaAaeqaaOGaeiykaKIaeiOla4caaa@46AF@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>We would like to show an actual example of meta-anlaysis. In Yoshida <it>et al</it>. <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, colon cancer exon array dataset was analyzed by primary version of ExonMiner. In this anlaysis, based on the Test 3 of ANOVA, gene <it>MUC17 </it>(Accession ID: NM_001040105) has <it>p</it>-values for ten patients:</p>
            <p>
               <display-formula><it>p</it><sub><it>h</it></sub><sup>1 </sup>= 0.313; <it>p</it><sub><it>h</it></sub><sup>2 </sup>= 0.0005; <it>p</it><sub><it>h</it></sub><sup>3 </sup>= 0.0005; <it>p</it><sub><it>h</it></sub><sup>4 </sup>= 0.8964; <it>p</it><sub><it>h</it></sub><sup>5 </sup>= 0.8201;</display-formula>
            </p>
            <p>
               <display-formula><it>p</it><sub><it>h</it></sub><sup>6 </sup>= 0.0002; <it>p</it><sub><it>h</it></sub><sup>7 </sup>= 0.6549; <it>p</it><sub><it>h</it></sub><sup>8 </sup>= 0.0179; <it>p</it><sub><it>h</it></sub><sup>9 </sup>= 0.0522; <it>p</it><sub><it>h</it></sub><sup>10 </sup>= 0.1664.</display-formula>
            </p>
            <p>These p-values are transformed into z-scores as:</p>
            <p>
               <display-formula><it>z</it><sub><it>h</it></sub><sup>1 </sup>= 0.487; <it>z</it><sub><it>h</it></sub><sup>2 </sup>= 3.291; <it>z</it><sub><it>h</it></sub><sup>3 </sup>= 3.291; <it>z</it><sub><it>h</it></sub><sup>4 </sup>= -1.261; <it>z</it><sub><it>h</it></sub><sup>5 </sup>= -0.916;</display-formula>
            </p>
            <p>
               <display-formula><it>z</it><sub><it>h</it></sub><sup>6 </sup>= 3.540; <it>z</it><sub><it>h</it></sub><sup>7 </sup>= -0.399; <it>z</it><sub><it>h</it></sub><sup>8 </sup>= 2.099; <it>z</it><sub><it>h</it></sub><sup>9 </sup>= 1.624; <it>z</it><sub><it>h</it></sub><sup>10 </sup>= 0.968.</display-formula>
            </p>
            <p>The integrated <it>z</it>-score is 4.023 and the integrated <it>p</it>-value is obtained as 2.86 &#215; 10<sup>-5</sup>.</p>
            <p>In the colon cancer example, we compute <it>q</it>-values from integrated <it>p</it>-values of meta-analysis, the list of the genes identified as having aberrant splice variations including exon skipping/retaining has 10% False Discovery Rate (FDR) that corresponds to <it>q</it>-value &lt; 0.1. In the above <it>MUC17</it>, the <it>q</it>-value is 0.0345 and it is determined as significant. The computation of <it>q</it>-value is shown in Yoshida <it>et al</it>. <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>.</p>
            <p>By using exon array data with ExonMiner, it is possible to detect alternative splicing like exon skipping/retaining, alternative usage of donor and acceptor splice sites and so on. However, since exon array does not have junction probes, custom array with junction probes or PCR method are needed for further analysis of detecting exact patterns of splice isoforms.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Implementation</p>
         </st>
         <sec>
            <st>
               <p>Data upload</p>
            </st>
            <p>The users are required to upload their exon array data. We prepared an FTP service for data upload. A reason for choosing FTP service for our system is that a large dataset can easily be uploaded. To increase the security level, we prepare one time account and password for FTP service. Note that one time account and password are different from the pair for login account and password of ExonMiner.</p>
         </sec>
         <sec>
            <st>
               <p>Statistical analysis engine</p>
            </st>
            <p>ExonMiner performs ANOVA for each transcript. To test the significance of each effect in ANOVA described in previous section, we implemented two types of tests: one is based on Gaussian noise model and it performs F-test, the other is based on nonparametric approach using bootstrap method. In the nonparametric approach, we need to compute test statistics repeatedly and it needs enormous computation. Therefore we implemented the ANOVA program by Fortran and optimized for high performance computing described in the latter section.</p>
         </sec>
         <sec>
            <st>
               <p>Visualization engine</p>
            </st>
            <p>The information of exon expression pattern for each transcript needs to be shown visually. We have developed two types of image generators and can make heatmap and barplot images optimized for exon array data. These images are generated by using R. The graphics library is originally developed.</p>
         </sec>
         <sec>
            <st>
               <p>Database</p>
            </st>
            <p>For the management of user information and probe annotation information, we use MySQL database server. For constructing a highly secure system, user login information is encrypted and stored in MySQL database. By keeping probe annotation information into MySQL database, users are not necessary to explore other databases. Thus ExonMiner is an all-in-one web service.</p>
         </sec>
         <sec>
            <st>
               <p>High performance computing on supercomputers</p>
            </st>
            <p>Since ANOVA for the full set of transcripts needs high performance computing, we perform each ANOVA computation in parallel on our supercomputer system. Our supercomputer system has eight Sun Fire 15 k and at most 700 CPUs can be used for parallel statistical computation by using Sun Grid Engine.</p>
         </sec>
         <sec>
            <st>
               <p>Web interface</p>
            </st>
            <p>In ExonMiner, PHP scripts deal with connections between front end users and our supercomputer system and dynamically generate images by executing visualization engine described above based on user input. PHP scripts generate HTML web pages with a uniformed style that increases usability.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results and discussions</p>
         </st>
         <sec>
            <st>
               <p>Overview of ExonMiner</p>
            </st>
            <sec>
               <st>
                  <p>Create user account</p>
               </st>
               <p>Figure <figr fid="F2">2</figr> shows a flowchart of ExonMiner. First, a user account will be created by request to ExonMiner. Figure <figr fid="F3">3</figr> shows the web page for user account registration. By filling the registration form, an e-mail with (1) ID (username), (2) login password and (3) confirmation URL will be sent to the user. Accessing the confirmation URL, the user ID will be activated and the personal page for the user is dynamically created.</p>
               <fig id="F2">
                  <title>
                     <p>Figure 2</p>
                  </title>
                  <caption>
                     <p>A flowchart for an analysis in ExonMiner</p>
                  </caption>
                  <text>
                     <p>
                        <b>A flowchart for an analysis in ExonMiner.  </b>
                     </p>
                  </text>
                  <graphic file="1471-2105-9-494-2"/>
               </fig>
               <fig id="F3">
                  <title>
                     <p>Figure 3</p>
                  </title>
                  <caption>
                     <p>A snapshot of the account registration screen</p>
                  </caption>
                  <text>
                     <p>
                        <b>A snapshot of the account registration screen.</b>
                     </p>
                  </text>
                  <graphic file="1471-2105-9-494-3"/>
               </fig>
            </sec>
            <sec>
               <st>
                  <p>FTP for data upload</p>
               </st>
               <p>For the upload of your data, you need to use FTP. For using FTP service in ExonMiner, user needs to get one time password and account for FTP.</p>
               <p>Note that the account of FTP is different from login account. Using the one time password, the user can upload CEL (TEXT) files archived by ZIP via FTP. ExonMiner supports CEL files as TEXT format (this CEL file is recognized as version 4), usual CEL files are, however, BINARY format (this CEL file is recognized as version 3). To convert BINARY CEL files (version 3) in TEXT format (version 4), "<it>CEL File Conversion Tool</it>" provided by Affymetrix Inc. is available <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>.</p>
            </sec>
            <sec>
               <st>
                  <p>Analysis options</p>
               </st>
               <p>User should fill up all of the analysis options. Then user will start the analysis. User must select all (A) &#8211; (I) options in Figure <figr fid="F4">4</figr> to start a statistical analysis by two-way ANOVA and meta-analysis.</p>
               <fig id="F4">
                  <title>
                     <p>Figure 4</p>
                  </title>
                  <caption>
                     <p>A snapshot of the analysis option selection screen</p>
                  </caption>
                  <text>
                     <p>
                        <b>A snapshot of the analysis option selection screen.</b>
                     </p>
                  </text>
                  <graphic file="1471-2105-9-494-4"/>
               </fig>
               <p>(A) Description: you can add a brief description of your analysis. It may be convenient that you put a name of this analysis to organize your analyses.</p>
               <p>(B) Select probe levels: you can select the level of expression information in exon array. Transcript Level: For transcripts, there are three levels, core, extended and full transcripts, according to their information quality based on their information sources. Like transcript level, user can choose Probe Level and Exon Resolution.</p>
               <p>(C) Select GFF: you can select chromosomes. Transcripts on the selected chromosomes will be analyzed. This selection can reduce computational time.</p>
               <p>(D) Select which CEL file is a patient or a control: user adds the outcome information to each CEL file you have uploaded by FTP.</p>
               <p>(E) Preprocessing data (background correction): user selects the type of normalization method. GC-content: the median values in the same GC-content probe groups are used as control values. Antigenomic background: the median values in the same GC-content antigenomic background probes are used as control values.</p>
               <p>(F) Preprocessing data (GC-content threshold): it is a possible case that probes with high GC-content work as noise. So you can remove such probes. In default, the probes with 20 or more GC-content are removed. If you want to use the all probes for analysis, you choose 26 as the cut-off.</p>
               <p>(G) Analysis type (model): user selects the analysis type from the following three types &#8211; Don't analyze: ExonMiner does not perform ANOVA. Only visualization and sequence information are available. Parametric analysis: Gaussian distribution is assumed as the noise model. Nonparametric analysis: ExonMiner does not assume any distributions for the noise model. Bootstrap test will be applied for computing <it>p</it>-values.</p>
               <p>(H) Analysis type (threshold for the number of probes): ExonMiner ignores probesets or exon clusters with small number of probes for stabilizing the results of ANOVA. You can choose this cut-off by this option.</p>
               <p>(I) Nonparametric analysis options: the number of bootstraps in nonparametric ANOVA is specified by this option.</p>
            </sec>
            <sec>
               <st>
                  <p>Visualization of the results</p>
               </st>
               <p>Setting the all options, user can start the analysis. When the analysis is completed, ExonMiner sends an e-mail to the user to announce that the calculation is finished. After that, the user can view result pages of the analysis with heatmaps, barplots, sequence information and calculated <it>p</it>-values of two-way ANOVA and results of meta-analysis. A screen shot of ExonMiner is given in Figure <figr fid="F5">5</figr>. In this figure, you can see the results of LGR5. LGR5 is one of the most significant genes in colon cancer exon arrays reported by Yoshida <it>et al</it>. <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. The colon cancer exon array data are provided by Affymetrix. We can reach the information for each transcript by either gene symbol or transcription cluster ID. The heatmap (A) represents the exon profiles of LGR5. The user can download the heatmap image as bitmap or postscript file. Sequence information (B) for the transcript is shown with hyperlinks to the external web sites, Entrez <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> and NetAffx <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. The table (C) shows calculated ANOVA <it>p</it>-values. User can view the barplot image of normalized exon expression for a pair of cells from the View hyperlinks. The <it>p</it>-values for parameters calculated in meta-analysis are shown in the bottom table. The user can download results in one Excel file.</p>
               <fig id="F5">
                  <title>
                     <p>Figure 5</p>
                  </title>
                  <caption>
                     <p>A snapshot of the analysis result viewer</p>
                  </caption>
                  <text>
                     <p>
                        <b>A snapshot of the analysis result viewer.  </b>
                     </p>
                  </text>
                  <graphic file="1471-2105-9-494-5"/>
               </fig>
               <p>Instead of the heatmap image, ExonMiner can produce barplot images. Figure <figr fid="F6">6</figr> is a barplot image for LGR5. A barplot image has three bar-graphs. Red bar-graph shows probe intensities in exon array of colon cancer cell and green bar-graph shows probe intensities in exon array of normal cell. We show the bars with lower intensities in dark color. If the color of the bar on a dark bar is red, the cell type of the dark bar is normal (green) and <it>vice versa</it>. By using dark bar-graph, the users easily find the differences of exon expressions between two cells. For example, from Figure <figr fid="F6">6</figr>, we can find that the exon expression levels of colon cancer cell are higher than those of normal cell in many exonic regions.</p>
               <fig id="F6">
                  <title>
                     <p>Figure 6</p>
                  </title>
                  <caption>
                     <p>A sample barplot image for each exon expression level</p>
                  </caption>
                  <text>
                     <p>A sample barplot image for each exon expression level.  </p>
                  </text>
                  <graphic file="1471-2105-9-494-6"/>
               </fig>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Availability and requirements</p>
         </st>
         <p>&#8226; Project name: ExonMiner</p>
         <p>&#8226; Project home page: <url>http://ae.hgc.jp/exonminer/</url></p>
         <p>&#8226; Anonymous accounts (no e-mail address for registration is needed): <url>http://ae.hgc.jp/exonminer/anonymous.html</url></p>
         <p>&#8226; Operating systems: any OS (that has an internet browser application)</p>
         <p>&#8226; Programming language: PHP, R, Fortran, Perl, Ruby, MySQL</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>ExonMiner is an all-in-one web service well suited for analysis of exon array data. Since it does not require any installation of software except for internet browsers, what all users need to do is to access the ExonMiner URL <url>http://ae.hgc.jp/exonminer</url>. ExonMiner can perform not only visualization of exon array data, but also can perform data normalization and user customized statistical analysis that is hard to run on a single computer. With the support of supercomputers in Human Genome Center, Institute of Medical Science, University of Tokyo, users can analyze full dataset of exon array data within hours with results of meta-analysis that finds aberrant splice variants as biomarkers.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>KN, AS and MN designed ExonMiner and KN implemented. KN and AS prepared the figures. RY and SI developed statistical analysis in ExonMiner. SM supervised the project. KN wrote the manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The authors would like to thank the three reviewers for their constructive comments and suggestions that improved the quality of the paper considerably. The authors also wish to thank the Affimetrix Japan Inc. for their allowance to link to their web site: NetAffx, and for their helpful suggestions. ExonMiner was supported by Human Genome Center, Institute of Medical Science, University of Tokyo.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Par4 is a coactivator for a splice isoform-specific transcriptional activation domain in WT1</p>
            </title>
            <aug>
               <au>
                  <snm>Richard</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Schumacher</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Royer-Pokora</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Roberts</snm>
                  <fnm>SGE</fnm>
               </au>
            </aug>
            <source>Genes Dev</source>
            <pubdate>2001</pubdate>
            <volume>15</volume>
            <issue>3</issue>
            <fpage>328</fpage>
            <lpage>339</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">312625</pubid>
                  <pubid idtype="pmpid" link="fulltext">11159913</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>A novel Bcr-Abl splice isoform is associated with the L248V mutation in CML patients with acquired resistance to imatinib</p>
            </title>
            <aug>
               <au>
                  <snm>Gruber</snm>
                  <fnm>FX</fnm>
               </au>
               <au>
                  <snm>Hjorth-Hansen</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Mikkola</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Stenke</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Johansen</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Leukemia</source>
            <pubdate>2006</pubdate>
            <volume>20</volume>
            <fpage>2057</fpage>
            <lpage>2060</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">17008892</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>IGB</p>
            </title>
            <url>http://www.affymetrix.com/support/developer/tools/affytools.affx</url>
         </bibl>
         <bibl id="B4">
            <title>
               <p>ExACT</p>
            </title>
            <url>http://www.affymetrix.com/products/software/specific/exact.affx</url>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Affymetrix Expression Console</p>
            </title>
            <url>http://www.affymetrix.com/products/software/specific/expression_console_software.affx</url>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Bioconductor: open software development for computational biology and bioinformatics</p>
            </title>
            <aug>
               <au>
                  <snm>Gentleman</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Carey</snm>
                  <fnm>VJ</fnm>
               </au>
               <au>
                  <snm>Bates</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Bolstad</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Dettling</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Dudoit</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ellis</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Gautier</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Ge</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Gentry</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hornik</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Hothorn</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Huber</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Iacus</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Irizarry</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Leisch</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Maechler</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rossini</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Sawitzki</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Smyth</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Tierney</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>JY</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <issue>10</issue>
            <fpage>R80</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">545600</pubid>
                  <pubid idtype="pmpid" link="fulltext">15461798</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>An annotation infrastructure for the analysis and interpretation of Affymetrix exon array data</p>
            </title>
            <aug>
               <au>
                  <snm>Okoniewski</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Yates</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Dibben</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>CJ</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <issue>5</issue>
            <fpage>R79</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1929135</pubid>
                  <pubid idtype="pmpid" link="fulltext">17498294</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Computational genome-wide discovery of aberrant splice variations with exon expression profiles</p>
            </title>
            <aug>
               <au>
                  <snm>Yoshida</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Numata</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Imoto</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Nagasaki</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Doi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ueno</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Miyano</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Proc IEEE 7th International Symposium on Bioinformatics &amp; Bioengineering</source>
            <pubdate>2007</pubdate>
            <fpage>715</fpage>
            <lpage>722</lpage>
         </bibl>
         <bibl id="B9">
            <title>
               <p>A statistical framework for genome-wide discovery of biomarker splice variations with GeneChip Human Exon 1.0 ST Arrays</p>
            </title>
            <aug>
               <au>
                  <snm>Yoshida</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Numata</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Imoto</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Nagasaki</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Doi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ueno</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Miyano</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genome Informatics</source>
            <pubdate>2006</pubdate>
            <volume>17</volume>
            <issue>1</issue>
            <fpage>88</fpage>
            <lpage>99</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">17503359</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Alternative splicing and differential gene expression in colon cancer detected by a whole genome exon array</p>
            </title>
            <aug>
               <au>
                  <snm>Gardina</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Shimada</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Staples</snm>
                  <fnm>MK</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Veitch</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Schweitzer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Awad</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Sugnet</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Dee</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Davies</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Williams</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Turpaz</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>325</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1769375</pubid>
                  <pubid idtype="pmpid" link="fulltext">17192196</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Alternative transcript analysis methods for exon arrays</p>
            </title>
            <aug>
               <au>
                  <cnm>Affymetrix</cnm>
               </au>
            </aug>
            <source>Affymetrix Whitepaper</source>
            <pubdate>2005</pubdate>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Cell File Conversion Tool</p>
            </title>
            <url>http://www.affymetrix.com/Auth/support/developer/downloads/Tools/CelFileCo nversion.ZIP</url>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Entrez: molecular biology database and retrieval system</p>
            </title>
            <aug>
               <au>
                  <snm>Schuler</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Epstein</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Ohkawa</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kans</snm>
                  <fnm>JA</fnm>
               </au>
            </aug>
            <source>Methods Enzymol</source>
            <pubdate>1996</pubdate>
            <volume>266</volume>
            <fpage>141</fpage>
            <lpage>162</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8743683</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>NetAffx: Affymetrix probesets and annotations</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Loraine</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Shigeta</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Cline</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Cheng</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Valmeekam</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kulp</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Siani-Rose</snm>
                  <fnm>MA</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <issue>1</issue>
            <fpage>82</fpage>
            <lpage>86</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">165568</pubid>
                  <pubid idtype="pmpid" link="fulltext">12519953</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
