<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art><ui>1471-2164-13-S7-S12</ui><ji>1471-2164</ji><fm>
<dochead>Proceedings</dochead>
<bibl>
<title>
<p>Meta-analytical biomarker search of EST expression data reveals three differentially expressed candidates</p>
</title>
<aug>
<au id="A1"><snm>Wu</snm><mi>H</mi><fnm>Timothy</fnm><insr iid="I1"/><email>g39328006@ym.edu.tw</email></au>
<au id="A2"><snm>Chu</snm><mi>J</mi><fnm>Lichieh</fnm><insr iid="I2"/><email>julie.chu@mail.cgu.edu.tw</email></au>
<au id="A3"><snm>Wang</snm><fnm>Jian-Chiao</fnm><insr iid="I3"/><email>jian.chiao.wang@gmail.com</email></au>
<au id="A4"><snm>Chen</snm><fnm>Ting-Wen</fnm><insr iid="I2"/><insr iid="I4"/><email>afra@mail.cgu.edu.tw</email></au>
<au id="A5"><snm>Tien</snm><fnm>Yin-Jing</fnm><insr iid="I5"/><email>gary@stat.sinica.edu.tw</email></au>
<au id="A6"><snm>Lin</snm><fnm>Wen-Chang</fnm><insr iid="I1"/><insr iid="I6"/><email>wenlin@ibms.sinica.edu.tw</email></au>
<au ca="yes" id="A7"><snm>Ng</snm><mi>V</mi><fnm>Wailap</fnm><insr iid="I1"/><insr iid="I3"/><insr iid="I7"/><email>wvng@ym.edu.tw</email></au>
</aug>
<insg>
<ins id="I1"><p>Institute of Biomedical Informatics, National Yang Ming University, Taipei, Taiwan, R.O.C</p></ins>
<ins id="I2"><p>Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan, R.O.C</p></ins>
<ins id="I3"><p>Department of Biotechnology and Laboratory Science in Medicine and Institute of Biotechnology in Medicine, National Yang Ming University, Taipei, Taiwan, R.O.C</p></ins>
<ins id="I4"><p>Bioinformatics Center, Chang Gung University, Taoyuan, Taiwan, R.O.C</p></ins>
<ins id="I5"><p>Institute of Statistical Science, Academia Sinica, Taipei, 11529, Taiwan, R.O.C</p></ins>
<ins id="I6"><p>Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan, R.O.C</p></ins>
<ins id="I7"><p>Center for Systems and Synthetic Biology, National Yang Ming University, Taipei, Taiwan, R.O.C</p></ins>
</insg>
<source>BMC Genomics</source>


<supplement><title><p>Eleventh International Conference on Bioinformatics (InCoB2012): Computational Biology</p></title><editor>Shoba Ranganathan, Christian Sch&#246;nbach, Sissades Tongsima, Jonathan Chan and Tin Wee Tan</editor><sponsor><note>The articles in this supplement were supported by funding agencies as detailed in the Acknowledgement section of each article</note></sponsor><note>Proceedings</note></supplement><conference><title><p>Asia Pacific Bioinformatics Network (APBioNet) Eleventh International Conference on Bioinformatics (InCoB2012)</p></title><location>Bangkok, Thailand</location><date-range>3-5 October 2012</date-range><url>http://www.incob2012.org/</url></conference><issn>1471-2164</issn>
<pubdate>2012</pubdate>
<volume>13</volume>
<issue>Suppl 7</issue>
<fpage>S12</fpage>
<url>http://www.biomedcentral.com/1471-2164/13/S7/S12</url>
<xrefbib><pubidlist><pubid idtype="pmpid">23282184</pubid><pubid idtype="doi">10.1186/1471-2164-13-S7-S12</pubid></pubidlist></xrefbib>
</bibl>
<history><pub><date><day>13</day><month>12</month><year>2012</year></date></pub></history>
<cpyrt><year>2012</year><collab>Wu et al.; licensee BioMed Central Ltd.</collab><note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
<abs>
<sec>
<st>
<p>Abstract</p>
</st>
<sec>
<st>
<p>Background</p>
</st>
<p>Researches have been conducted for the identification of differentially expressed genes (DEGs) by generating and mining of cDNA expressed sequence tags (ESTs) for more than a decade. Although the availability of public databases make possible the comprehensive mining of DEGs among the ESTs from multiple tissue types, existing studies usually employed statistics suitable only for two categories. Multi-class test has been developed to enable the finding of tissue specific genes, but subsequent search for cancer genes involves separate two-category test only on the ESTs of the tissue of interest. This constricts the amount of data used. On the other hand, simple pooling of cancer and normal genes from multiple tissue types runs the risk of Simpson's paradox. Here we presented a different approach which searched for multi-cancer DEG candidates by analyzing all pertinent ESTs in all categories and narrowing down the cancer biomarker candidates via integrative analysis with microarray data and selection of secretory and membrane protein genes as well as incorporation of network analysis. Finally, the differential expression patterns of three selected cancer biomarker candidates were confirmed by real-time qPCR analysis.</p>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<p>Seven hundred and twenty three primary DEG candidates (p-value &lt; 0.05 and lower bound of confidence interval of odds ratio &#8807; 1.65) were selected from a curated EST database with the application of Cochran-Mantel-Haenszel statistic (CMH). GeneGO analysis results indicated this set as neoplasm enriched. Cross-examination with microarray data further narrowed the list down to 235 genes, among which 96 had membrane or secretory annotations. After examined the candidates in protein interaction network, public tissue expression databases, and literatures, we selected three genes for further evaluation by real-time qPCR with eight major normal and cancer tissues. The higher-than-normal tissue expression of COL3A1, DLG3, and RNF43 in some of the cancer tissues is in agreement with our <it>in silico </it>predictions.</p>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>Searching digitized transcriptome using CMH enabled us to identify multi-cancer differentially expressed gene candidates. Our methodology demonstrated simultaneously analysis for cancer biomarkers of multiple tissue types with the EST data. With the revived interest in digitizing the transcriptomes by NGS, cancer biomarkers could be more precisely detected from the ESTs. The three candidates identified in this study, COL3A1, DLG3, and RNF43, are valuable targets for further evaluation with a larger sample size of normal and cancer tissue or serum samples.</p>
</sec>
</sec>
</abs>
</fm><bdy>
<sec>
<st>
<p>Background</p>
</st>
<p>One of the key aspects in the study of cancer is to understand the principles and mechanisms of gene expression variation contributing to cancer genesis and progression. The identification of genes differentially expressed between normal and cancer cells/tissues is not only helpful for designing diagnostic and therapeutic procedures, but also for understanding cancer biology as a whole. In this regard, DNA microarrays have been the dominating platform in the high-throughput study of cancer transcriptomes since their emergence in the mid-1990s <abbrgrp>
<abbr bid="B1">1</abbr>
<abbr bid="B2">2</abbr>
</abbrgrp>. However, there are several drawbacks, which include: high background level signals resulting from cross-hybridization <abbrgrp>
<abbr bid="B3">3</abbr>
<abbr bid="B4">4</abbr>
</abbrgrp>; difference in hybridization properties due to different probe sequences; limited dynamic range due to background level and saturation, and difficulty in detecting splicing isoforms and unknown genes. For these reasons, with the advancement of the next generation sequencers, we are seeing high-throughput transcriptome mapping and quantifying method, also known as RNA-Seq, to begin to supersede microarray in expression profiling. However, RNA-Seq experiments are relatively demanding in terms of time, cost, and computation equipment. Experimental differences between different sequencing platforms may complicate transcriptome analysis with multiple tissue sources. Since exploring meta-analysis from traditional digital expression data such as EST derived from cDNAs <abbrgrp>
<abbr bid="B5">5</abbr>
<abbr bid="B6">6</abbr>
<abbr bid="B7">7</abbr>
<abbr bid="B8">8</abbr>
</abbrgrp> is more feasible, this study may serve as a precursor to more complicated experiments.</p>
<p>Originally primarily aimed for cataloging of transcript repertoire, ESTs from large-scale cDNA sequencing projects such as Cancer Genome Anatomy Project (CGAP), Human Cancer Genome Project (HCGP), and Cancer Genome Project (CGP) also allow searching for differentially expressed genes (DEGs) in specific tissue types or in whole genomes <abbrgrp>
<abbr bid="B9">9</abbr>
<abbr bid="B10">10</abbr>
<abbr bid="B11">11</abbr>
</abbrgrp>. Several <it>in silico </it>analysis tools such as NCBI Unigene cDNA xProfiler <abbrgrp>
<abbr bid="B12">12</abbr>
</abbrgrp>, CGAP Digital Differential Display (DDD) <abbrgrp>
<abbr bid="B13">13</abbr>
</abbrgrp>, and CGAP Digital Gene Expression Displayer (DGED) <abbrgrp>
<abbr bid="B14">14</abbr>
</abbrgrp> are available online allowing the analysis of publicly available data. While standard statistical methods such as Fisher's exact test for finding DEGs in two-class problems (e.g. cancer vs. normal) or Pearson's correlation are commonly used <abbrgrp>
<abbr bid="B9">9</abbr>
</abbrgrp>, there are also specially developed methods for finding DEGs in the landscape of digital signals for two-library problems <abbrgrp>
<abbr bid="B15">15</abbr>
<abbr bid="B16">16</abbr>
</abbrgrp> or for multiple libraries <abbrgrp>
<abbr bid="B17">17</abbr>
</abbrgrp>. The online tools as well as the statistical methods remain useful to this day in EST or even RNA-Seq projects <abbrgrp>
<abbr bid="B18">18</abbr>
<abbr bid="B19">19</abbr>
<abbr bid="B20">20</abbr>
<abbr bid="B21">21</abbr>
<abbr bid="B22">22</abbr>
<abbr bid="B23">23</abbr>
</abbrgrp>. Aside from searching for DEGs, the searches for gene transcript isoforms specific to particular libraries were also demonstrated and many of these attribute differentially expressed isoforms to human cancers <abbrgrp>
<abbr bid="B24">24</abbr>
<abbr bid="B25">25</abbr>
<abbr bid="B26">26</abbr>
<abbr bid="B27">27</abbr>
<abbr bid="B28">28</abbr>
<abbr bid="B29">29</abbr>
<abbr bid="B30">30</abbr>
<abbr bid="B31">31</abbr>
</abbrgrp>.</p>
<p>In spite of the successful applications, these tools or methods are not without limitations. xProfiler reports differential expression in an all-or-none manner where only a list, but not statistical quantification, of candidates is reported. DDD allows quantification using Fisher's exact test. However, the nature of the test dictates that comparisons of three or more libraries involve multiple pair-wise comparisons, and thus there are no easy comparisons of library specific genes. DGED uses a Bayesian approach to find DEGs, but it is also pair-wise. The reported "odds ratio" is perhaps better described as "relative risk" and may be biased with unequal sampling. Another popular and useful Bayesian-based method originally developed for EST analysis by Audic and Cleverie <abbrgrp>
<abbr bid="B15">15</abbr>
</abbrgrp> is also popular for RNA-Seq data. It is less conservative than Fisher's exact test, but it also does not apply to multi-class problems. The multi-class comparison method established by Stekel <it>et al</it>. <abbrgrp>
<abbr bid="B17">17</abbr>
</abbrgrp> finds specificity in one condition out of all and is useful in application such as finding DEGs in multi-tissue libraries. However, in the search for cancer DEGs, a subsequent analysis of differential expression between cancer and normal libraries of the tissue of interest may not yield fruitful results due to the possible scarcity of EST sampling in the particular tissue type. On the other hand, the na&#239;ve method of pooling all data into the two-class problem of normal versus cancer when searching for differentially expressed genes or differentially splice variants <abbrgrp>
<abbr bid="B27">27</abbr>
</abbrgrp> risks introducing bias. In extreme cases, one may encounter the fallacy of Simpson's paradox <abbrgrp>
<abbr bid="B32">32</abbr>
</abbrgrp> where genes in reality more active in the normal condition appear to be more so in the cancer condition (discussed later in this paper).</p>
<p>We now report on the application of a computational and integrative approach to analyze cancer differentially expressed genes (DEGs). The statistical method we employed is Cochran-Mantel-Haenszel statistics (CMH) <abbrgrp>
<abbr bid="B33">33</abbr>
</abbrgrp> and to the best of our knowledge has not been applied in this context. Instead of pooling all normal and all cancer ESTs from different tissue types to fit into a two-class problem as by using the 2 by 2 contingency &#967;<sup>2 </sup>test or the Fisher's exact test, CMH allows original stratification of libraries in their respective tissue types, yet exhaustively analyzes expression between cancer and normal conditions across all tissue types. The method is an extension to &#967;<sup>2 </sup>test which, in our application, measures the association between cancer and gene expressions, adjusting for the tissue confounding factor. This approach allows one to find genes that are overall differentially expressed in cancer, or multiple-cancer genes, irrespective to a specific tissue type. The method is demonstrated in this paper to exhaustively analyze ESTs from the dbEST database <abbrgrp>
<abbr bid="B34">34</abbr>
</abbrgrp>. To the best knowledge of the authors, such an all-inclusive, whole-transcriptome analysis has not been redone in recent years now that more ESTs than ever are available.</p>
<p>Our filtering of EST libraries was also more rigorous than many previous studies. Notably, we excluded the ORESTES (open reading-frame EST sequencing) libraries <abbrgrp>
<abbr bid="B35">35</abbr>
</abbrgrp> on which a normalization procedure had been applied. Libraries from cell line were also excluded owning to their unrepresentativeness of primary cancer cell transcriptomes. Our analysis pipeline further focused on enrichment of the DEGs by cross examination with expression data of a different platform, <it>i.e</it>. the microarray data, and selecting for membrane and secretory associated protein genes since we intend to find therapeutic targets or biomarkers, and conducting STRING (The Search Tool for the Retrieval of Interacting Genes) network analysis to show the cancer enriched clusters <abbrgrp>
<abbr bid="B36">36</abbr>
</abbrgrp>. With real-time qPCR validation, we have identified three candidates that are inclined to express in cancer across more than one tissue types. We hope such a meta-analytical and multiple-tissue comparison can serve as an exploratory experiment for future multi-library or multi-tissue study of other digital sources such as RNA-Seq.</p>
</sec>
<sec>
<st>
<p>Methods</p>
</st>
<sec>
<st>
<p>Overview</p>
</st>
<p>Our approach was to exploit the entire collection of human EST sequences from dbEST <abbrgrp>
<abbr bid="B34">34</abbr>
</abbrgrp> to obtain transcripts from different type of cells/tissues/organs. The assumption was that the activities of the genes can be represented by their transcripts, and also reflected by the number of representing ESTs in the NCBI dbEST database, given that a large number of mRNAs (cDNAs) were sequenced. Pertinent sequences from different sources were matched to genes and tallied together. Through the annotation of each EST record, we obtained the tissue type and condition type (normal or cancer) from which it was derived. With the information, we then had the entire gene transcription profile for all the tissues and conditions. Next, cross examining data of other sources including microarray data, secretory and membrane associations as well as analyzing protein associations with STRING <abbrgrp>
<abbr bid="B36">36</abbr>
</abbrgrp> allowed us to narrow down the list of candidate genes. The process is illustrated in Figure <figr fid="F1">1</figr>.</p>
<fig id="F1"><title><p>Figure 1</p></title><caption><p>The basic steps in searching for differential expression genes</p></caption><text>
   <p><b>The basic steps in searching for differential expression genes</b>. EST library selection involves selection of suitable EST clone libraries, EST to gene assignment, counting the results, remove tissue categories with low counts, statistical analysis with CMH and the narrow-down of differentially expressed genes (DEGs). The narrow-down procedures includes cross referencing with public microarray data, annotating membrane and secretory proteins, analyzing with String network, and for a few selected genes, validate the expression in different tissues by RT-qPCR.</p>
</text><graphic file="1471-2164-13-S7-S12-1"/></fig>
</sec>
<sec>
<st>
<p>Human gene reference sequence preparation</p>
</st>
<p>The NCBI Reference Sequences (RefSeq Release 38, November 11, 2009) <abbrgrp>
<abbr bid="B37">37</abbr>
</abbrgrp> were downloaded from its ftp site <abbrgrp>
<abbr bid="B38">38</abbr>
</abbrgrp>. <it>Homo sapiens </it>RefSeq records were selected and subjected to repeat masking via RepeatMasker <abbrgrp>
<abbr bid="B39">39</abbr>
</abbrgrp>.</p>
</sec>
<sec>
<st>
<p>Human EST sequence preparation and library filtering</p>
</st>
<p>Human EST data (Released on December 11, 2009) and their cDNA library information were downloaded from NCBI dbEST database <abbrgrp>
<abbr bid="B34">34</abbr>
</abbrgrp> and CGAP <abbrgrp>
<abbr bid="B40">40</abbr>
</abbrgrp>. Program in Python language was written to mark for discard the unsuitable libraries when the keywords such as "enrichment", "subtract", "pcr", and "normalized" were found in the DESCR, UNIQUE_PROTOCOL, or KEYWORDS fields of the library information. An arbitrary cutoff of &gt; 400 was chosen to the highly unrepresentative libraries (approximately 7,000 libraries constituting approximately 650,000 ESTs were discarded as a consequence). To curb from incorrect inclusions or exclusions, we finalized the process with manual curation. Libraries made from mixed tissues or cell lines were also discarded. The final libraries from CGAP were manually classified into 48 different tissue types and two different conditions, normal and cancer.</p>
</sec>
<sec>
<st>
<p>EST to gene assignment</p>
</st>
<p>The BLAT alignment tool was used to align ESTs to RefSeqs as a mean to assign ESTs to genes <abbrgrp>
<abbr bid="B41">41</abbr>
</abbrgrp>. The criteria of having an identity of 95% or above and the minimum length of 100 nucleotides were set for a match. The RefSeq match with the highest identity was assigned for the EST. If two RefSeq matches shared exactly the same identity, the program chose the first encountered.</p>
</sec>
<sec>
<st>
<p>EST count and summarization</p>
</st>
<p>The procedure attributes each transcript represented by RefSeq its expression profile across different tissue and condition types based on EST assignment counts. Each EST has its corresponding tissue type and condition type classification, based on its source clone library. For example, a transcript with an aligned EST from a lung cancer clone library is one expression count each in tissue type lung and condition type cancer. This way, after all ESTs were counted, each transcript has a profile of expression across various libraries and conditions. Expressions from different transcript variants of the same gene were pooled to obtain a single gene expression. The raw counts were thus made into transcription profile for each gene for further statistical analysis.</p>
</sec>
<sec>
<st>
<p>Statistical evaluation of cancer candidates</p>
</st>
<p>Cochran-Mantel-Haenszel statistics (CMH) was applied to evaluate cancer differential expression of each gene. To evaluate each gene, other genes were pooled as "other genes" to create a 2 &#215; 2 &#215; <it>k </it>table consisting of data from tissue-condition cross, where <it>k </it>was the number of tissues &#215; 2 (two conditions). A contrived example of 2 &#215; 2 &#215; <it>k </it>table where <it>k </it>is 2 is shown in Table <tblr tid="T1">1</tblr>. Gene A is the gene under study while other genes are pooled together as "other genes". Only Tissue I and Tissue II columns are calculated in CMH. The pooled ones are not part of the analysis. Akin to Fisher's exact test, the test assumes that "other genes" should consist mostly of genes not differentially expressed between normal or cancer conditions. Or, some of them are DEGs for one condition, but they are at least partly canceled out by DEGs for the other. In any case, the imbalances of cancer counts to normal counts in the second row is regarded as owning to sample bias and it serves as a metric against which Gene A is measured. By continuously isolate values for gene currently under study while pooling all other genes to the second row, an odds ratio and a confidence interval is calculated for each gene. Genes with a p-value &lt; 0.05 and an lower bound of confidence interval of odds ratio &#8807; 1.65 are selected for further analyses.</p>
<tbl id="T1"><title><p>Table 1</p></title><caption><p>A hypothetical EST count table demonstrating CMH analysis and also a contrived example of Simpson's paradox.</p></caption><tblbdy cols="7">
      <r>
         <c>
            <p/>
         </c>
         <c ca="center" cspan="2">
            <p>
               <b>Tissue I</b>
            </p>
         </c>
         <c ca="center" cspan="2">
            <p>
               <b>Tissue II</b>
            </p>
         </c>
         <c ca="center" cspan="2">
            <p>
               <b>Pooled</b>
            </p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="2">
            <hr/>
         </c>
         <c cspan="2">
            <hr/>
         </c>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>
               <b>Normal</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Cancer</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Normal</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Cancer</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Normal</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Cancer</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="7">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>Gene A</p>
         </c>
         <c ca="center">
            <p>280</p>
         </c>
         <c ca="center">
            <p>580</p>
         </c>
         <c ca="center">
            <p>20</p>
         </c>
         <c ca="center">
            <p>20</p>
         </c>
         <c ca="center">
            <p>
               <b>300</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>600</b>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>Other genes</p>
         </c>
         <c ca="center">
            <p>
               <b>20,000</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>80,000</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>380,000</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>620,000</b>
            </p>
         </c>
         <c ca="center">
            <p>400,000</p>
         </c>
         <c ca="center">
            <p>700,000</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>This hypothetical case serves both as an example of how Cochran-Mantel-Haenszel (CMH) is applied as well as the occurrence of Simpson's paradox. Gene A is the gene under investigation. Expressions from all other genes are pooled into the "other genes" row. Bold typeface indicates columns showing higher cancer vs. normal propensities. CMH is applied on the stratified tissue columns (but not on the pooled data). A casual observation involving only the pooled data would suggest Gene A as having higher expression in cancer (<it>X</it><sup>2 </sup>test p-value close to 0 when analyzing only the pooled). However, a closer inspection on each of the tissue columns reveals otherwise. The observed difference between cancer and normal of the "other genes" is theoretically mostly due to sampling bias.</p>
   </tblfn></tbl>
</sec>
<sec>
<st>
<p>Microarray cross reference</p>
</st>
<p>Human U133 Plus 2.0 GeneChip array CEL data were downloaded from Gene Expression Omnibus (GEO) <abbrgrp>
<abbr bid="B42">42</abbr>
</abbrgrp>. When computing power allows, the data were processed with AffyPLM <abbrgrp>
<abbr bid="B43">43</abbr>
</abbrgrp> using its three-step procedure of processing background signals with GCRMA, normalizing signals with quantile normalization, and summarize probe signals with medium polish. For large experimental datasets that were computationally infeasible for us, we used justRMA from the Affy package <abbrgrp>
<abbr bid="B44">44</abbr>
</abbrgrp>. For experimental dataset without raw CEL data, we obtained the pre-processed matrix files via GEOQuery <abbrgrp>
<abbr bid="B45">45</abbr>
</abbrgrp>. Regardless of the source of array signal processing, we analyzed the genes for differential expression with Limma <abbrgrp>
<abbr bid="B46">46</abbr>
</abbrgrp>. Differentially expressed gene candidates with p-value &lt; 0.05 and logFC &gt; 1.0 were selected and crossed with genes from EST profiling with statistical evaluation. For each array, the significant genes were crossed with our EST profiling results. The union of these intersecting genes was selected for further evaluation.</p>
</sec>
<sec>
<st>
<p>Annotation of secretory proteins</p>
</st>
<p>To identify our differentially expressed genes with secretory annotation, a list of 3,975 proteins with secretory annotation originated from the conglomeration of data from Uniprot (1,632 unique proteins) <abbrgrp>
<abbr bid="B47">47</abbr>
</abbrgrp>, Human Plasma Proteome Organization (HUPO) (889 proteins), and Secreted Protein Database (SPD) (4,142 proteins) <abbrgrp>
<abbr bid="B48">48</abbr>
</abbrgrp>. This list was matched against DEGs to give them secretory annotation.</p>
</sec>
<sec>
<st>
<p>Annotation of membrane proteins</p>
</st>
<p>Membrane protein annotations were gathered from five sources - TOPDB (283 proteins) <abbrgrp>
<abbr bid="B49">49</abbr>
</abbrgrp>, LOCATE (2629 proteins) <abbrgrp>
<abbr bid="B50">50</abbr>
</abbrgrp>, PDB_TM (41 proteins) <abbrgrp>
<abbr bid="B51">51</abbr>
<abbr bid="B52">52</abbr>
</abbrgrp>, OPM (107 proteins) <abbrgrp>
<abbr bid="B53">53</abbr>
</abbrgrp>, and MPDB (23 proteins) <abbrgrp>
<abbr bid="B54">54</abbr>
</abbrgrp> - to generate a unique list of 2,767 membrane proteins. Any DEGs on this list would confer it a membrane annotation.</p>
</sec>
<sec>
<st>
<p>Validation of tissue expression profiles of candidate genes</p>
</st>
<p>TissueScan&#8482; Cancer Survey Panel 96-I qPCR array panel (Origene Technologies, Rockville, MD) containing the cDNAs of 3 normal and 9 cancer tissues each from 8 organs (breast, colon, kidney, liver, lung, ovarian, prostate, and thyroid) was used to examine the expression profiles of selected cancer differentially expressed gene candidates. Real-time qPCR analyses with the Taqman<sup>&#174; </sup>Gene Expression Assay kits (Applied Biosystems, Foster City, CA) and FAM- and VIC-labeled target genes and HPRT1 internal control primers, respectively, were performed according to the manufacturer's suggested procedure on an Applied Biosystems Prism 7500 system. Relative specific gene expression was quantified by normalization against the HPRT1 with the &#916;CT method. Gene expression changes were quantified as 2 <sup>- (CT gene - CT control)</sup>.</p>
</sec>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<sec>
<st>
<p>Human ESTs selection and tissue distribution</p>
</st>
<p>The basic steps of our analysis are illustrated in Figure <figr fid="F1">1</figr>. A total of 8,296,089 human EST sequences (Dec. 11, 2009 release) were downloaded from the NCBI. Despite the size of the data, not all ESTs are relevant for our gene expression analysis. After screening the 8,907 EST libraries as described in the methods section above, 8,447 unsuitable libraries, the preparation of which involved PCR amplification, normalization, subtraction, etc. or originated from cell lines, were discarded. The remaining 460 libraries consisted of 2,386,536 EST sequences representing approximately a third of all the downloaded human ESTs.</p>
<p>After BLAT alignment of the 2,386,536 ESTs to 44,513 gene transcripts from RefSeqs, approximately 1,644,960 (68.92%) ESTs with at least 100 nucleotides matched to RefSeqs were detected. An examination of the sources of the matched ESTs indicated that the representativeness of each tissue is skewed and that the brain is the most represented out of all tissues. Among the 48 different tissues, brain ESTs constituted 26% of all matched ESTs, uterus (6.40%) ranked second, followed by testis (5.91%), placenta (4.33%), pancreas (3.99%), muscle (3.88%), liver (3.51%), kidney (3.52) and others each below 3% (see Additional file <supplr sid="S1">1</supplr>). Similarly, condition type (normal and cancer) representation was also skewed. Normal tissue type had 1,251,883 ESTs combined, and cancer tissue had 393,077 ESTs in the ratio of roughly 3 to 1. Originally before filtering out those from the cell lines, there were more cancer ESTs and the ratio of normal ESTs to cancer ESTs was roughly 1 to 3. This showed how much more rigorous our filtering was. Unfortunately, this also meant we had a much smaller dataset to work with.</p>
<suppl id="S1">
<title>
<p>Additional file 1</p>
</title>
<text>
<p>
<b>Tissue and library distributions of 1,644,960 ESTs</b>. This table shows the number of ESTs assigned to each tissue type prior to matching to reference sequences.</p>
</text>
<file name="1471-2164-13-S7-S12-S1.docx">
   <p>Click here for file</p>
</file>
</suppl>
<p>The unequal distribution of the 1,644,960 matched ESTs in different tissue types caused some tissue types to be ill-represented. For example, the number of brain EST hits dominated over other tissue types. On the other hand, spinal cord had the least count with 430 EST hits. The latter had little value for our application. Therefore, we only took a tissue type into consideration when its total EST hit count was above the cut-off of 20,000. Considering that the human genome has approximately 22,000 genes, the cut-off still did not allow "deep" probe into gene expression. Nevertheless, the method we employed did not attempt to identify specific gene expression in one particular tissue; therefore, the problem was mitigated.</p>
<p>We also categorized ESTs according to their clone library classification, to either be from normal or from cancer. Sometimes a certain tissue-condition type was so under-represented that the information was not trustworthy. For example, adipose had 10,362 normal hits but only 440 cancer hits, and heart tissue had 22,179 normal hits but no cancer hits. For these cases, data was kept throughout the analysis. But these data did not make contribution to our analysis.</p>
<p>Since our EST assignments were made to transcripts represented by RefSeq sequences, when the entire assignment procedure was done, each transcript variant had its expression profile across all tissue-condition types. Due to the lack of enough ESTs data, differentiating between different splicing variants of the same gene was not feasible. We had to pool expression from different splicing variants into a single expression profile representing the gene.</p>
</sec>
<sec>
<st>
<p>Analysis of differentially expressed genes</p>
</st>
<p>Due to the small sample size (EST counts), it was only realistic to evaluate gene expression based on all ESTs of all tissues. However, tissue type was a confounder. If all counts for each gene were pooled as "normal" or "cancer" regardless of the tissue of origin, the count would be incorrect. To solve both the sample size and the tissue confounder problems, Cochran-Mantel-Haenszel statistical method was employed to identify genes with differential expression as described in the method. We used the arbitrary cut-offs of p-value &lt; 0.05 and odds ratio &#8807; 1.65 to obtain a primary set of candidates. As a result, a total of 723 cancer differentially expressed gene candidates were selected. The 1.65 cut-off is chosen based on a good coverage to a list of well-known biomarkers or genes known to associate with cancer (Table <tblr tid="T2">2</tblr>).</p>
<tbl id="T2"><title><p>Table 2</p></title><caption><p>EST counts and odd ratios of 11 well-known cancer-related genes present in our list of DEGs.</p></caption><tblbdy cols="6">
      <r>
         <c ca="center">
            <p>
               <b>Gene symbol</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Description</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Total</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Normal</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Cancer</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Odds ratio</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>BCAN</p>
         </c>
         <c ca="left">
            <p>Homo sapiens brevican</p>
         </c>
         <c ca="center">
            <p>391</p>
         </c>
         <c ca="center">
            <p>79</p>
         </c>
         <c ca="center">
            <p>312</p>
         </c>
         <c ca="center">
            <p>10.4</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>KRT14</p>
         </c>
         <c ca="left">
            <p>Homo sapiens keratin 14</p>
         </c>
         <c ca="center">
            <p>205</p>
         </c>
         <c ca="center">
            <p>40</p>
         </c>
         <c ca="center">
            <p>165</p>
         </c>
         <c ca="center">
            <p>9.1</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>KRT16</p>
         </c>
         <c ca="left">
            <p>Homo sapiens keratin 16</p>
         </c>
         <c ca="center">
            <p>41</p>
         </c>
         <c ca="center">
            <p>7</p>
         </c>
         <c ca="center">
            <p>34</p>
         </c>
         <c ca="center">
            <p>7.8</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>MMP11</p>
         </c>
         <c ca="left">
            <p>Homo sapiens matrix metallopeptidase 11 (stromelysin 3)</p>
         </c>
         <c ca="center">
            <p>68</p>
         </c>
         <c ca="center">
            <p>20</p>
         </c>
         <c ca="center">
            <p>48</p>
         </c>
         <c ca="center">
            <p>5.3</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>MUC1</p>
         </c>
         <c ca="left">
            <p>Homo sapiens mucin 1, cell surface associated</p>
         </c>
         <c ca="center">
            <p>69</p>
         </c>
         <c ca="center">
            <p>30</p>
         </c>
         <c ca="center">
            <p>39</p>
         </c>
         <c ca="center">
            <p>4.2</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>VEGFA</p>
         </c>
         <c ca="left">
            <p>Homo sapiens vascular endothelial growth factor A</p>
         </c>
         <c ca="center">
            <p>82</p>
         </c>
         <c ca="center">
            <p>33</p>
         </c>
         <c ca="center">
            <p>49</p>
         </c>
         <c ca="center">
            <p>3.7</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>AGRN</p>
         </c>
         <c ca="left">
            <p>Homo sapiens agrin</p>
         </c>
         <c ca="center">
            <p>503</p>
         </c>
         <c ca="center">
            <p>143</p>
         </c>
         <c ca="center">
            <p>360</p>
         </c>
         <c ca="center">
            <p>3.5</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>COL3A1</p>
         </c>
         <c ca="left">
            <p>Homo sapiens collagen, type III, alpha 1</p>
         </c>
         <c ca="center">
            <p>145</p>
         </c>
         <c ca="center">
            <p>90</p>
         </c>
         <c ca="center">
            <p>55</p>
         </c>
         <c ca="center">
            <p>3.5</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>MMP1</p>
         </c>
         <c ca="left">
            <p>Homo sapiens matrix metallopeptidase 1 (interstitial collagenase)</p>
         </c>
         <c ca="center">
            <p>70</p>
         </c>
         <c ca="center">
            <p>29</p>
         </c>
         <c ca="center">
            <p>41</p>
         </c>
         <c ca="center">
            <p>3.3</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>EGFR</p>
         </c>
         <c ca="left">
            <p>Homo sapiens epidermal growth factor receptor (erythroblasticleukemia viral (v-erb-b) oncogene homolog, avian)</p>
         </c>
         <c ca="center">
            <p>49</p>
         </c>
         <c ca="center">
            <p>79</p>
         </c>
         <c ca="center">
            <p>312</p>
         </c>
         <c ca="center">
            <p>10.4</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>AFP</p>
         </c>
         <c ca="left">
            <p>Homo sapiens alpha-fetoprotein</p>
         </c>
         <c ca="center">
            <p>391</p>
         </c>
         <c ca="center">
            <p>40</p>
         </c>
         <c ca="center">
            <p>165</p>
         </c>
         <c ca="center">
            <p>9.1</p>
         </c>
      </r>
   </tblbdy></tbl>
<p>To show that this list of 723 genes was enriched for cancer and thus obtains credibility for our methodology, we looked for cancer related pathways associated with them in GeneGO <abbrgrp>
<abbr bid="B55">55</abbr>
</abbrgrp> pathways, which covered 650 signaling and metabolic networks (Figure <figr fid="F2">2A</figr>). Among the 10 most significantly matched pathways, several are cancer related - Pathways number 1, 3, and 4 involve immune response; number 2 and 5 involve cytoskeleton remodeling; number 6 is transition and termination of DNA replication; and number 8 and number 9 are adhesion related. In addition, the result of GeneGo disease enrichment analysis (Figure <figr fid="F2">2B</figr>) indicates our set of genes as neoplasm enriched: seven out of the 10 most associated diseases are related to cancer. The disease ranks the highest is neoplasms, followed by neoplasm by site, and digestive systems neoplasm. This list reveals that our 723 DEGs covers general neoplasm related functions, and not specific to any particular neoplasm, as digestive, urogenital and breast are all covered.</p>
<fig id="F2"><title><p>Figure 2</p></title><caption><p>GeneGo pathway (A) and disease (B) analyses indicated cancer-related genes were enriched</p></caption><text>
   <p><b>GeneGo pathway (A) and disease (B) analyses indicated cancer-related genes were enriched</b>. A. Eight out of 10 pathways enriched are related to cancer. Among these, Pathways 1, 3, and 4 are also related to immune responses. Pathway 2 and 5 are involved with cytoskeleton remodeling. Pathway 6 is transition and termination of DNA replication. Pathway 8 and 9 are adhesion related. B. Out of top 10 most enriched GeneGO disease categories, 7 are cancer related (1-3, 5, 6, 8, and 10). The more significant it is the longer the orange bar. A bar of 15 in length corresponds to a p-value of 1e-15.</p>
</text><graphic file="1471-2164-13-S7-S12-2"/></fig>
<p>To narrow down this list of biomarkers, we crossed examined the expression profiles of the candidates with the differentially expressed genes in 6 microarray experiments, i.e. two each of ovary and uterus, and one each of pancreas and colon (Table <tblr tid="T3">3</tblr>). These tissue types were selected based on the following reasons. We noticed that many of our candidate genes had the most expression in ovary tissue (after normalization). The other concern was the number of ESTs. Since our candidate genes were derived from EST sampling of various tissue types, they were influenced more heavily by tissue types with more EST representation due to deeper sampling from them. Therefore, the rest of the tissue types were selected based on their representativeness. Of the 723 DEGs, 235 candidates were also found to be differentially expressed genes in our microarray analysis.</p>
<tbl id="T3"><title><p>Table 3</p></title><caption><p>Five microarray projects cross referenced with our set of 723 DEGs</p></caption><tblbdy cols="5">
      <r>
         <c ca="left">
            <p>
               <b>GEO</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Tissue type</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Test sample size (n vs. c)</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Sig genes DN</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Reference</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>GSE18520</p>
         </c>
         <c ca="center">
            <p>Ovary</p>
         </c>
         <c ca="center">
            <p>10 vs. 53</p>
         </c>
         <c ca="center">
            <p>79</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B66">66</abbr>
               </abbrgrp>
            </p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>GSE14407</p>
         </c>
         <c ca="center">
            <p>Ovary</p>
         </c>
         <c ca="center">
            <p>12 vs. 12</p>
         </c>
         <c ca="center">
            <p>109</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B67">67</abbr>
               </abbrgrp>
            </p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>GSE764</p>
         </c>
         <c ca="center">
            <p>Uterus</p>
         </c>
         <c ca="center">
            <p>4 vs. 7 benign</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>Unpublished</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>GSE764</p>
         </c>
         <c ca="center">
            <p>Uterus</p>
         </c>
         <c ca="center">
            <p>4 vs. 8 malignant</p>
         </c>
         <c ca="center">
            <p>2</p>
         </c>
         <c ca="center">
            <p>Unpublished</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>GSE15471</p>
         </c>
         <c ca="center">
            <p>Pancreas</p>
         </c>
         <c ca="center">
            <p>39 vs. 39</p>
         </c>
         <c ca="center">
            <p>120</p>
         </c>
         <c ca="center">
            <p>
               <abbrgrp>
                  <abbr bid="B68">68</abbr>
               </abbrgrp>
            </p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>GSE23878</p>
         </c>
         <c ca="center">
            <p>Colon</p>
         </c>
         <c ca="center">
            <p>24 vs. 35</p>
         </c>
         <c ca="center">
            <p>74</p>
         </c>
         <c ca="center">
            <p>Unpublished</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>n: normal, c: cancer</p>
      <p>GSE764 has two entries since we compared pair-wise between normal vs. benign and normal vs. malignant.</p>
   </tblfn></tbl>
<p>Since membrane and secretory proteins could be potential therapeutic target or serum biomarkers, the subcellular location of the 235 DEGs were examined against the secretory and membrane protein lists consolidated from public databases. Among these, 96 DEGs were putative membrane or secretory proteins - 57 had only secretory annotation, 27 had only membrane annotation and 12 had both.</p>
</sec>
<sec>
<st>
<p>Literature search and STRING analysis of the 96 DEGs</p>
</st>
<p>To further examine whether the 96 membrane/secretory DEGs identified in our EST database mining had enriched cancer-related genes, we searched the literatures for known associations with cancers. In additions, they were also analyzed with STRING for interactions, which are based on experimental evidence or prediction, such as conserved genomic neighborhood, gene fusion, co-occurrence across genomes, pathways, protein complex, co-regulation, or other literature sources such as co-mentioning. The network of the STRING interactions of the 96 DEGs together with the literature search results were plotted based on the combined STRING score with Cytoscape <abbrgrp>
<abbr bid="B56">56</abbr>
</abbrgrp> (Figure <figr fid="F3">3</figr>). Approximately 68 proteins formed a big cluster of interacting proteins and a large proportion of the DEGs (88%) had published cancer association with clinical or non-clinical experimental supports. This demonstrates the value of our integration strategy since we had an ample of literature supports.</p>
<fig id="F3"><title><p>Figure 3</p></title><caption><p>STRING analysis and literature reviews indicated most candidate genes are highly connected and have published cancer associations</p></caption><text>
   <p><b>STRING analysis and literature reviews indicated most candidate genes are highly connected and have published cancer associations</b>. STRING analysis is based on several evidences indicating the strength of relationships between pairs of proteins. The STRING output was imported into Cytoscape to depict the network. Darker link denotes stronger relationships. Node colors red, apricot, and white denotes literature search with clinical cancer reports, cancer report but not from clinical samples, and no literature search support found, respectively. Lighter border denotes greater logFC of cancer over normal expression in microarray.</p>
</text><graphic file="1471-2164-13-S7-S12-3"/></fig>
<p>The 96 DEGs were selected out of their general cancer propensity without necessarily referring to any particular tissue type. However, we can still assess the general tissue distributions shown in Figure <figr fid="F4">4</figr>. A gene has a tissue representation if any EST from a clone library of the tissue type is matched to it. We can see that some genes are observed across many tissue types. A gene could be observed across a variety of tissue types if it is pan-tissue, and its expression measure is relatively abundant. Separately, Woolf's test for heterogeneity can also give hints to whether a gene is pan-cancer. Those that were found as significant in this test were considered having unequal representation in different genes; although whether they are pan-cancer require further evaluation.</p>
<fig id="F4"><title><p>Figure 4</p></title><caption><p>Distribution of the 96 membrane and secretory DEGs across tissue types</p></caption><text>
   <p><b>Distribution of the 96 membrane and secretory DEGs across tissue types</b>. The 96 DEGs were selected after cross examined with microarray data and membrane and secretory protein lists. A gene is marked as observed in a tissue when at least one EST is from that tissue type. The CMH odds ratios are given to give a perspective of their tendencies of differential expressions. The total EST counts are shown in colors to give a sense of confidence.</p>
</text><graphic file="1471-2164-13-S7-S12-4"/></fig>
</sec>
<sec>
<st>
<p>Three candidates had higher expression in several cancer tissues</p>
</st>
<p>Three cancer differentially expressed secreted protein gene candidates, COL3A1 (Collagen alpha-1(III) chain), DLG3 (Discs large homolog 3), and RNF43 (Ring finger protein 43), which had an odds ratio of 3.55, 7.97, and 4.03, respectively, and with limited or no clinical support were selected for real-time qPCR analysis using the Taqman<sup>&#174; </sup>Gene Expression Assay kits (Applied Biosystems, Foster City, CA) (Figure <figr fid="F5">5</figr>). With the HPRT1 as the reference, higher expressions of these genes were noticed in at least some of the cancer tissues. Apparently, the average relative expression levels of COL3A1 in breast, liver, thyroid cancer samples were higher than their normal counterparts. The average expression levels of DLG3 in breast, kidney, liver, lung, and ovarian cancers, and RNF43 in colon, liver, lung, ovarian, and prostate cancers were also found to be higher than their normal tissues. The expression of COL3A1 in approximately 5 of the liver cancers, DLG3 in 5 of the liver, 7 lung and 5 ovarian cancers, and RN43 in 7 of the colon, 8 ovarian and 5 prostate cancers seemed to have higher expressions than the normal tissues. In light of the limited sample size, the three candidates appear to have an overall higher expression in cancer tissues.</p>
<fig id="F5"><title><p>Figure 5</p></title><caption><p>Relative transcript levels of COL3A1, DLG3, RNF43 in normal and cancer tissues detected by RT-qPCR analysis</p></caption><text>
   <p><b>Relative transcript levels of COL3A1, DLG3, RNF43 in normal and cancer tissues detected by RT-qPCR analysis</b>. Each dot represents the relative gene expression level normalized against the individual HPRT1 level of each tissue specimen. A total of 3 normal (N) and 9 cancer (T) tissue samples from 8 different tissues or organs were analyzed.</p>
</text><graphic file="1471-2164-13-S7-S12-5"/></fig>
</sec>
</sec>
<sec>
<st>
<p>Discussion</p>
</st>
<p>Reported here is an integrative, meta-analytic approach for the discovery of pan-cancer differentially expressed gene candidates. Our primary enrichment included a set of 723 DEGs with cancer associations supported by GeneGO disease and pathway analysis. Further integrative evaluations with cancer differentially expressed genes suggested by microarray data narrowed the list down to 234 genes, and among these there were 96 DEGs likely belonged to either secretory and membrane protein genes. Further STRING protein network analysis and literature reviewing indicated 71% of the 96 DEGs were highly connected and many of them were associated with cancers in previous publications.</p>
<sec>
<st>
<p>Simpson's paradox</p>
</st>
<p>The meta-analytic nature of our study brought us the opportunities as well as challenges to study the digital signatures of various transcriptomes in a new perspective. Comparing to experimental methods that focus on a single tissue type or limited tissue types, our approach allows us to find genes inclined to express in cancer in a pan-tissue manner. An important challenge of our approach is to avoid Simpson's paradox which can occur in a meta-analysis study <abbrgrp>
<abbr bid="B32">32</abbr>
</abbrgrp>. Simpson's paradox is where the association between two variables may show a correlation that is reversed in direction from what is observed from stratified subgroups. A contrived example is shown in Table <tblr tid="T1">1</tblr> in which gene A appears to have a higher cancer expression when pooled, but it is in fact not so under individual stratified sub-tables. This may be somewhat of an extreme case, where directionality of the ratios actually differs between the sub-tables and the pooled table. However, the tissue confounder still introduces bias, large or small, that may throw our judgment off. In this study, we used CMH to analyze the data based on stratified sub-tables to avoid running into this paradox. One could also analyze only one tissue at a time for differential expression, but this means one has a smaller dataset to work with. CMH could avoid this problem since it uses EST counts from all tissues instead of analyzing just the normal and cancer propensity under each individual tissue type.</p>
<p>In our actual data, the odds ratio of the pooled table is also different from that of the stratified table. For example, the gene PTRF, a polymerase I and transcript release factor, has a pooled odds ratio of 0.40 and a CMH odds ratio of 0.16 calculated from stratified sub-tables. In this particular case, both odds ratios indicated an inclination toward a higher normal expression and are both statistically significant although at different degree (the pooled has a p-value of 5.269e-15 under a 2x2 &#967;<sup>2 </sup>test <abbrgrp>
<abbr bid="B57">57</abbr>
</abbrgrp> versus CMH's 7.16E-77). For the gene VCAN (versican), the pooled odds ratio is 1.86 and &#967;<sup>2 </sup>test yields a significant p-value of 1.83e-4. However, CMH gives an insignificant result for this gene with p-value of 0.25. As an extreme case, GBP6 (guanylate binding protein family, member 6) has a pooled odds ratio of 6.69 and &#967;<sup>2 </sup>test gives a p-value smaller than 2.2e-16 (approaching 0), whereas with CMH the odds ratio is 0.73, actually indicating a higher normal counts, although CMH p-value of 0.15 is insignificant. This indicates Simpson's paradox in action. Careful inspection showed that all cancer counts and most normal counts of GBP6 were contributed by the tongue tissue source. Out of a total of 50 cancer counts and 21 EST normal counts, tongue accounts for cancer and normal counts of 50 and 17, respectively. For this gene, the tongue cancer count 50 is not influential under a total of 29,479 cancer counts and 7,486 counts for the tongue. Thus pooling loses information in this respect and gives a false impression that its cancer expression is much higher when summing all cancer counts from all tissues. Stratifying by tissue type guards against this bias.</p>
</sec>
<sec>
<st>
<p>Heterogeneity of odds ratios</p>
</st>
<p>In the strictest application, the use of Cochran-Mantel-Haenszel method requires the odds ratios of the sub-tables be homogeneous. In our context, it means the ratio of gene expressions between cancer and normal tissues are probably the same among all tissue types under study and any observed variability is most likely due to sampling bias. Also, the calculated odds ratio would be the estimated common odds ratio across the tissue strata. In our case, however, not all genes had similar ratios under each tissue (based on Woolf's test for homogeneity available in Additional file <supplr sid="S2">2</supplr> under the "Woolf" column label), and this was of course expected. In spite of this, we were interested in the overall expression patterns of the genes in cancer conditions. We were not interested in an estimate of common odds ratio across the strata, which often does not exist. We were interested in hypothesis testing - to give us leads to the genes that had higher cancer expression in general. In this regard, the test could be applied <abbrgrp>
<abbr bid="B58">58</abbr>
<abbr bid="B59">59</abbr>
</abbrgrp>. The CMH odds ratio is a weighted average of the odds ratio in each tissue classification and can give us a summary measure <abbrgrp>
<abbr bid="B60">60</abbr>
</abbrgrp>, which we used to prioritize and followed up with subsequent biological analyses. In other word, an odds ratio in our data was merely a value that "average up" across all tissue types. From these ratios we were able to reveal the preferential cancer expressions, since the list covered a number of important known biomarkers, and enrichment of cancer-related genes were supported by knowledge-based GeneGO analyses and previous publications.</p>
<suppl id="S2">
<title>
<p>Additional file 2</p>
</title>
<text>
<p>
<b>EST pipeline raw data</b>. This is the raw EST count from the EST pipeline imported into Excel. The columns are the condition type, tissue, and condition-tissue type stratifications. The rows represent the EST counts that are assigned to genes.</p>
</text>
<file name="1471-2164-13-S7-S12-S2.xls">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
<sec>
<st>
<p>Lower bound of confidence interval</p>
</st>
<p>Another distinctive tactic we used is the selection of DEGs among the statistically significant genes (p-value &lt; 0.05) base on lower bounds of the confidence interval of the odds ratio estimates. The popular approach to search for DEGs is to select genes base on p-value first, and then select the subset base on parameter estimators such as odds ratio or fold change values. The p-value criterion selects the statistically significant ones (those not likely to be the result of random fluctuation). The subsequent criterion is based on prior domain knowledge. However, among those with statistically significant p-values and similar parameter estimators, the ranges of the estimations can vary widely. Using our dataset as an example, the two genes TUBA1B and FAM60A both have odd values of 2.38 (Additional file <supplr sid="S2">2</supplr>). However, for TUBA1B, it is within the 95% confidence that its true odds ratio is between 2.26 and 2.50. Yet for FAM60A it is between 1.59 and 3.54. Based on our background knowledge and for future application, if we must select genes having odds ratios greater than 2.0, then using odds ratio as cutoff would not serve this purpose since it is quite possible that the real odds ratio (i.e., of the population) is below 2.0. Choosing genes based on their confidence intervals would be more precise, but this has not been much appreciated.</p>
</sec>
<sec>
<st>
<p>Multi-cancer biomarkers</p>
</st>
<p>The multi-cancer approach compares genes that are overall differentially expressed among multiple cancer types comparing to their respective normal tissue types. Although many biomarker studies focus on gene differentially expressed in a particular tissue type, Wu <it>et al</it>. found 8 proteins in the conditioned media of 23 cell lines showing negative or weak tissue staining in the Human protein atlas, suggesting them to be potential pan-cancer markers <abbrgrp>
<abbr bid="B61">61</abbr>
</abbrgrp>. Sahin <it>et al</it>., found that claudin-18 splice variant 2 had the ectopic activations in pancreatic, esophageal, ovarian, and lung tumors while its expression in normal tissue only occurred in differentiated epithelial cells of the gastric mucosa, confirmed by RT-PCR <abbrgrp>
<abbr bid="B62">62</abbr>
</abbrgrp>. These studies suggested that relatively multi-cancer genes or multi-cancer splice variants exist. The three candidates COL3A1 (Collagen alpha-1(III) chain), DLG3 (Discs large homolog 3) (plasma membrane), and RNF43 (Ring finger protein 43) are putative secreted or plasma membrane proteins with the potential of developing serum diagnostic reagents. In reviewing the involvement of these genes with cancers in previous studies, hint for pan-cancer marker was surfaced as the expression of the extracellular matrix protein COL3A1 gene in brain cancer <abbrgrp>
<abbr bid="B63">63</abbr>
</abbrgrp> and angiofibroma <abbrgrp>
<abbr bid="B64">64</abbr>
</abbrgrp> was elevated. While secreted membrane bound RNF43 protein gene was known to be up-regulated in colorectal cancer <abbrgrp>
<abbr bid="B65">65</abbr>
</abbrgrp>. Interestingly, upon the real-time qPCR analysis of three cancer differentially expressed secreted protein gene candidates, COL3A1, DLG3, and RNF43 identified in this study, higher cancer expression levels of these genes in multiple cancer types were verified. This does not only indicate the usefulness of our computational approach and filtering procedure but also encourages us to devote further resources for assessing the clinical usages of these three candidates.</p>
</sec>
<sec>
<st>
<p>Pooling of gene expression</p>
</st>
<p>Earlier in this discussion, we mentioned that na&#239;ve pooling of data may introduce bias and at worst may produce Simpson's paradox. We also mentioned that we have tackled this problem with CMH. Nevertheless, two other occasions of pooling actually took place. We pooled expression from different splicing variants from the same gene to make one gene expression. We also pooled different libraries of the same tissue into one tissue classification. In both of these cases, we may encounter expression bias, since different splicing variants and different tissue libraries (i.e., tissues from different patients) might have differences in expression patterns. This is an unfortunate limitation in this and similar studies, since dbEST data consists of many different sources, and given the relative lack of data after the very stringent criteria we have used in our library selection compare to previous studies (Most importantly the exclusion of ESTs from cell lines, PCR amplification, subtraction, and cDNA normalization protocols). We opted for pooling since we had comparatively limited number of sequences to work with (1,644,960 out of 8,296,089 downloaded - 18.03%). Nonetheless, future digital expression profiling can be made better with the RNA-Seq methodology that offers a greater depth of coverage than ESTs obtained from traditional cDNA sequencing. It gives a much larger sampling size that makes more realistic the differentiation among isoforms and also makes pooling of different libraries of the same tissue less necessary. As for discovery of pan-cancer genes or isoforms when studying multiple tissue types, similar idea as outlined in this study would be just as applicable.</p>
</sec>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>We have demonstrated that the use of the Cochran-Mantel-Haenszel statistic in the integrative approaches allowed us to identify potential biomarkers or therapeutic targets via exhaustive search of various EST libraries from dbEST. As shown in previous study, splice variant could be useful target of antibody therapy <abbrgrp>
<abbr bid="B62">62</abbr>
</abbrgrp>. The method can be easily extended over to searching cancer differential splicing variants had there been enough data. The issues involved in the analysis, such as the Simpson's paradox and the pan-cancer markers, may also be encountered in other multi-class digital analysis. The three targets confirmed by real-time qPCR, COL3A1, DLG3, and RNF43, are worthy of further evaluation for clinical applications.</p>
</sec>
<sec>
<st>
<p>Competing interests</p>
</st>
<p>The authors declare that they have no competing interests.</p>
</sec>
<sec>
<st>
<p>Authors' contributions</p>
</st>
<p>TW and LC selected and filtered EST libraries as well as STRING analysis. TW conceived the deployment of the statistical method and implemented and ran the EST analysis pipeline. Microarray slides were selected by LC and analyzed by TW. TW, LC, TC were involved with literature search. LC selects the genes for expression validation and STRING analysis. JW performed the real-time qPCR analysis. YT is involved with the statistical interpretation of the results. WL and WN provided direction and guidance. All authors read and approved the final manuscript.</p>
</sec>
</bdy><bm>
<ack>
<sec>
<st>
<p>Acknowledgements</p>
</st>
<p>This work was supported by grants NSC 99-3112-B-010-003 (W.V. Ng) from the National Science Council and an intramural grant derived from the Aim for the Top University Grant awarded to National Yang Ming University from the Ministry of Education, Taiwan, the Republic of China.</p>
<p>This article has been published as part of <it>BMC Genomics </it>Volume 13 Supplement 7, 2012: Eleventh International Conference on Bioinformatics (InCoB2012): Computational Biology. The full contents of the supplement are available online at <url>http://www.biomedcentral.com/bmcgenomics/supplements/13/S7</url>.</p>
</sec>
</ack>
<refgrp><bibl id="B1"><title><p>Light-generated oligonucleotide arrays for rapid DNA sequence analysis</p></title><aug><au><snm>Pease</snm><fnm>AC</fnm></au><au><snm>Solas</snm><fnm>D</fnm></au><au><snm>Sullivan</snm><fnm>EJ</fnm></au><au><snm>Cronin</snm><fnm>MT</fnm></au><au><snm>Holmes</snm><fnm>CP</fnm></au><au><snm>Fodor</snm><fnm>SP</fnm></au></aug><source>Proceedings of the National Academy of Sciences of the United States of America</source><pubdate>1994</pubdate><volume>91</volume><issue>11</issue><fpage>5022</fpage><lpage>5026</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.91.11.5022</pubid><pubid idtype="pmcid">43922</pubid><pubid idtype="pmpid" link="fulltext">8197176</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>Quantitative monitoring of gene expression patterns with a complementary DNA microarray</p></title><aug><au><snm>Schena</snm><fnm>M</fnm></au><au><snm>Shalon</snm><fnm>D</fnm></au><au><snm>Davis</snm><fnm>RW</fnm></au><au><snm>Brown</snm><fnm>PO</fnm></au></aug><source>Science (New York, NY)</source><pubdate>1995</pubdate><volume>270</volume><issue>5235</issue><fpage>467</fpage><lpage>470</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.270.5235.467</pubid><pubid idtype="pmpid" link="fulltext">7569999</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>In situ analysis of cross-hybridisation on microarrays and the inference of expression correlation</p></title><aug><au><snm>Casneuf</snm><fnm>T</fnm></au><au><snm>Van de Peer</snm><fnm>Y</fnm></au><au><snm>Huber</snm><fnm>W</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2007</pubdate><volume>8</volume><fpage>461</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-8-461</pubid><pubid idtype="pmcid">2213692</pubid><pubid idtype="pmpid" link="fulltext">18039370</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>Hybridization interactions between probesets in short oligo microarrays lead to spurious correlations</p></title><aug><au><snm>Okoniewski</snm><fnm>MJ</fnm></au><au><snm>Miller</snm><fnm>CJ</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2006</pubdate><volume>7</volume><fpage>276</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-7-276</pubid><pubid idtype="pmcid">1513401</pubid><pubid idtype="pmpid" link="fulltext">16749918</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>Complementary DNA sequencing: expressed sequence tags and human genome project</p></title><aug><au><snm>Adams</snm><fnm>MD</fnm></au><au><snm>Kelley</snm><fnm>JM</fnm></au><au><snm>Gocayne</snm><fnm>JD</fnm></au><au><snm>Dubnick</snm><fnm>M</fnm></au><au><snm>Polymeropoulos</snm><fnm>MH</fnm></au><au><snm>Xiao</snm><fnm>H</fnm></au><au><snm>Merril</snm><fnm>CR</fnm></au><au><snm>Wu</snm><fnm>A</fnm></au><au><snm>Olde</snm><fnm>B</fnm></au><au><snm>Moreno</snm><fnm>RF</fnm></au><etal/></aug><source>Science (New York, NY)</source><pubdate>1991</pubdate><volume>252</volume><issue>5013</issue><fpage>1651</fpage><lpage>1656</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.2047873</pubid><pubid idtype="pmpid" link="fulltext">2047873</pubid></pubidlist></xrefbib></bibl><bibl id="B6"><title><p>cDNA analyses in the human genome project</p></title><aug><au><snm>Matsubara</snm><fnm>K</fnm></au><au><snm>Okubo</snm><fnm>K</fnm></au></aug><source>Gene</source><pubdate>1993</pubdate><volume>135</volume><issue>1-2</issue><fpage>265</fpage><lpage>274</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/0378-1119(93)90076-F</pubid><pubid idtype="pmpid">8276268</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>cDNA sequencing: a means of understanding cellular physiology</p></title><aug><au><snm>Weinstock</snm><fnm>KG</fnm></au><au><snm>Kirkness</snm><fnm>EF</fnm></au><au><snm>Lee</snm><fnm>NH</fnm></au><au><snm>Earle-Hughes</snm><fnm>JA</fnm></au><au><snm>Venter</snm><fnm>JC</fnm></au></aug><source>Curr Opin Biotechnol</source><pubdate>1994</pubdate><volume>5</volume><issue>6</issue><fpage>599</fpage><lpage>603</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/0958-1669(94)90081-7</pubid><pubid idtype="pmpid">7765742</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence</p></title><aug><au><snm>Adams</snm><fnm>MD</fnm></au><au><snm>Kerlavage</snm><fnm>AR</fnm></au><au><snm>Fleischmann</snm><fnm>RD</fnm></au><au><snm>Fuldner</snm><fnm>RA</fnm></au><au><snm>Bult</snm><fnm>CJ</fnm></au><au><snm>Lee</snm><fnm>NH</fnm></au><au><snm>Kirkness</snm><fnm>EF</fnm></au><au><snm>Weinstock</snm><fnm>KG</fnm></au><au><snm>Gocayne</snm><fnm>JD</fnm></au><au><snm>White</snm><fnm>O</fnm></au><etal/></aug><source>Nature</source><pubdate>1995</pubdate><volume>377</volume><issue>6547 Suppl</issue><fpage>3</fpage><lpage>174</lpage><xrefbib><pubid idtype="pmpid">7566098</pubid></xrefbib></bibl><bibl id="B9"><title><p>Large-scale statistical analyses of rice ESTs reveal correlated patterns of gene expression</p></title><aug><au><snm>Ewing</snm><fnm>RM</fnm></au><au><snm>Ben Kahla</snm><fnm>A</fnm></au><au><snm>Poirot</snm><fnm>O</fnm></au><au><snm>Lopez</snm><fnm>F</fnm></au><au><snm>Audic</snm><fnm>S</fnm></au><au><snm>Claverie</snm><fnm>JM</fnm></au></aug><source>Genome research</source><pubdate>1999</pubdate><volume>9</volume><issue>10</issue><fpage>950</fpage><lpage>959</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.9.10.950</pubid><pubid idtype="pmcid">310820</pubid><pubid idtype="pmpid" link="fulltext">10523523</pubid></pubidlist></xrefbib></bibl><bibl id="B10"><title><p>Discovery of three genes specifically expressed in human prostate by expressed sequence tag database analysis</p></title><aug><au><snm>Vasmatzis</snm><fnm>G</fnm></au><au><snm>Essand</snm><fnm>M</fnm></au><au><snm>Brinkmann</snm><fnm>U</fnm></au><au><snm>Lee</snm><fnm>B</fnm></au><au><snm>Pastan</snm><fnm>I</fnm></au></aug><source>Proceedings of the National Academy of Sciences of the United States of America</source><pubdate>1998</pubdate><volume>95</volume><issue>1</issue><fpage>300</fpage><lpage>304</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.95.1.300</pubid><pubid idtype="pmcid">18207</pubid><pubid idtype="pmpid" link="fulltext">9419370</pubid></pubidlist></xrefbib></bibl><bibl id="B11"><title><p>Exhaustive mining of EST libraries for genes differentially expressed in normal and tumour tissues</p></title><aug><au><snm>Schmitt</snm><fnm>AO</fnm></au><au><snm>Specht</snm><fnm>T</fnm></au><au><snm>Beckmann</snm><fnm>G</fnm></au><au><snm>Dahl</snm><fnm>E</fnm></au><au><snm>Pilarsky</snm><fnm>CP</fnm></au><au><snm>Hinzmann</snm><fnm>B</fnm></au><au><snm>Rosenthal</snm><fnm>A</fnm></au></aug><source>Nucleic acids research</source><pubdate>1999</pubdate><volume>27</volume><issue>21</issue><fpage>4251</fpage><lpage>4260</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/27.21.4251</pubid><pubid idtype="pmcid">148701</pubid><pubid idtype="pmpid" link="fulltext">10518618</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>The Cancer Genome Anatomy Project cDNA xProfiler</p></title><url>http://cgap.nci.nih.gov/Tissues/xProfiler</url></bibl><bibl id="B13"><title><p>NCBI Unigene Digital Differential Display</p></title><url>http://www.ncbi.nlm.nih.gov/UniGene/ddd.cgi</url></bibl><bibl id="B14"><title><p>The Cancer Genome Anatomy Project Digital Gene Expression Displayer</p></title></bibl><bibl id="B15"><title><p>The significance of digital gene expression profiles</p></title><aug><au><snm>Audic</snm><fnm>S</fnm></au><au><snm>Claverie</snm><fnm>JM</fnm></au></aug><source>Genome research</source><pubdate>1997</pubdate><volume>7</volume><issue>10</issue><fpage>986</fpage><lpage>995</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">9331369</pubid></xrefbib></bibl><bibl id="B16"><title><p>A public database for gene expression in human cancers</p></title><aug><au><snm>Lal</snm><fnm>A</fnm></au><au><snm>Lash</snm><fnm>AE</fnm></au><au><snm>Altschul</snm><fnm>SF</fnm></au><au><snm>Velculescu</snm><fnm>V</fnm></au><au><snm>Zhang</snm><fnm>L</fnm></au><au><snm>McLendon</snm><fnm>RE</fnm></au><au><snm>Marra</snm><fnm>MA</fnm></au><au><snm>Prange</snm><fnm>C</fnm></au><au><snm>Morin</snm><fnm>PJ</fnm></au><au><snm>Polyak</snm><fnm>K</fnm></au><etal/></aug><source>Cancer Res</source><pubdate>1999</pubdate><volume>59</volume><issue>21</issue><fpage>5403</fpage><lpage>5407</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">10554005</pubid></xrefbib></bibl><bibl id="B17"><title><p>The comparison of gene expression from multiple cDNA libraries</p></title><aug><au><snm>Stekel</snm><fnm>DJ</fnm></au><au><snm>Git</snm><fnm>Y</fnm></au><au><snm>Falciani</snm><fnm>F</fnm></au></aug><source>Genome research</source><pubdate>2000</pubdate><volume>10</volume><issue>12</issue><fpage>2055</fpage><lpage>2061</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.GR-1325RR</pubid><pubid idtype="pmcid">313085</pubid><pubid idtype="pmpid" link="fulltext">11116099</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>A transcriptome anatomy of human colorectal cancers</p></title><aug><au><snm>Lu</snm><fnm>B</fnm></au><au><snm>Xu</snm><fnm>J</fnm></au><au><snm>Lai</snm><fnm>M</fnm></au><au><snm>Zhang</snm><fnm>H</fnm></au><au><snm>Chen</snm><fnm>J</fnm></au></aug><source>BMC cancer</source><pubdate>2006</pubdate><volume>6</volume><fpage>40</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2407-6-40</pubid><pubid idtype="pmcid">1402307</pubid><pubid idtype="pmpid" link="fulltext">16504081</pubid></pubidlist></xrefbib></bibl><bibl id="B19"><title><p>Molecular cloning and characterization of a novel human testis-specific gene by use of digital differential display</p></title><aug><au><snm>Nie</snm><fnm>D</fnm></au><au><snm>Xiang</snm><fnm>Y</fnm></au></aug><source>Journal of genetics</source><pubdate>2006</pubdate><volume>85</volume><issue>1</issue><fpage>57</fpage><lpage>62</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1007/BF02728971</pubid><pubid idtype="pmpid" link="fulltext">16809841</pubid></pubidlist></xrefbib></bibl><bibl id="B20"><title><p>Analysis of expressed sequence tags generated from full-length enriched cDNA libraries of melon</p></title><aug><au><snm>Clepet</snm><fnm>C</fnm></au><au><snm>Joobeur</snm><fnm>T</fnm></au><au><snm>Zheng</snm><fnm>Y</fnm></au><au><snm>Jublot</snm><fnm>D</fnm></au><au><snm>Huang</snm><fnm>M</fnm></au><au><snm>Truniger</snm><fnm>V</fnm></au><au><snm>Boualem</snm><fnm>A</fnm></au><au><snm>Hernandez-Gonzalez</snm><fnm>ME</fnm></au><au><snm>Dolcet-Sanjuan</snm><fnm>R</fnm></au><au><snm>Portnoy</snm><fnm>V</fnm></au><etal/></aug><source>BMC genomics</source><pubdate>2011</pubdate><volume>12</volume><fpage>252</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-12-252</pubid><pubid idtype="pmcid">3118787</pubid><pubid idtype="pmpid" link="fulltext">21599934</pubid></pubidlist></xrefbib></bibl><bibl id="B21"><title><p>An efficient approach to finding Siraitia grosvenorii triterpene biosynthetic genes by RNA-seq and digital gene expression analysis</p></title><aug><au><snm>Tang</snm><fnm>Q</fnm></au><au><snm>Ma</snm><fnm>XJ</fnm></au><au><snm>Mo</snm><fnm>CM</fnm></au><au><snm>Wilson</snm><fnm>IW</fnm></au><au><snm>Song</snm><fnm>C</fnm></au><au><snm>Zhao</snm><fnm>H</fnm></au><au><snm>Yang</snm><fnm>YF</fnm></au><au><snm>Fu</snm><fnm>W</fnm></au><au><snm>Qiu</snm><fnm>DY</fnm></au></aug><source>BMC genomics</source><pubdate>2011</pubdate><volume>12</volume><issue>1</issue><fpage>343</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-12-343</pubid><pubid idtype="pmcid">3161973</pubid><pubid idtype="pmpid" link="fulltext">21729270</pubid></pubidlist></xrefbib></bibl><bibl id="B22"><title><p>Composite transcriptome assembly of RNA-seq data in a sheep model for delayed bone healing</p></title><aug><au><snm>Jager</snm><fnm>M</fnm></au><au><snm>Ott</snm><fnm>CE</fnm></au><au><snm>Grunhagen</snm><fnm>J</fnm></au><au><snm>Hecht</snm><fnm>J</fnm></au><au><snm>Schell</snm><fnm>H</fnm></au><au><snm>Mundlos</snm><fnm>S</fnm></au><au><snm>Duda</snm><fnm>GN</fnm></au><au><snm>Robinson</snm><fnm>PN</fnm></au><au><snm>Lienau</snm><fnm>J</fnm></au></aug><source>BMC genomics</source><pubdate>2011</pubdate><volume>12</volume><fpage>158</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-12-158</pubid><pubid idtype="pmcid">3074554</pubid><pubid idtype="pmpid" link="fulltext">21435219</pubid></pubidlist></xrefbib></bibl><bibl id="B23"><title><p>Identification of tissue-specific, abiotic stress-responsive gene expression patterns in wine grape (Vitis vinifera L.) based on curation and mining of large-scale EST data sets</p></title><aug><au><snm>Tillett</snm><fnm>RL</fnm></au><au><snm>Ergul</snm><fnm>A</fnm></au><au><snm>Albion</snm><fnm>RL</fnm></au><au><snm>Schlauch</snm><fnm>KA</fnm></au><au><snm>Cramer</snm><fnm>GR</fnm></au><au><snm>Cushman</snm><fnm>JC</fnm></au></aug><source>BMC plant biology</source><pubdate>2011</pubdate><volume>11</volume><fpage>86</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2229-11-86</pubid><pubid idtype="pmcid">3224124</pubid><pubid idtype="pmpid" link="fulltext">21592389</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>Selecting for functional alternative splices in ESTs</p></title><aug><au><snm>Kan</snm><fnm>Z</fnm></au><au><snm>States</snm><fnm>D</fnm></au><au><snm>Gish</snm><fnm>W</fnm></au></aug><source>Genome research</source><pubdate>2002</pubdate><volume>12</volume><issue>12</issue><fpage>1837</fpage><lpage>1845</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.764102</pubid><pubid idtype="pmcid">187565</pubid><pubid idtype="pmpid" link="fulltext">12466287</pubid></pubidlist></xrefbib></bibl><bibl id="B25"><title><p>Genome-wide detection of tissue-specific alternative splicing in the human transcriptome</p></title><aug><au><snm>Xu</snm><fnm>Q</fnm></au><au><snm>Modrek</snm><fnm>B</fnm></au><au><snm>Lee</snm><fnm>C</fnm></au></aug><source>Nucleic acids research</source><pubdate>2002</pubdate><volume>30</volume><issue>17</issue><fpage>3754</fpage><lpage>3766</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkf492</pubid><pubid idtype="pmcid">137414</pubid><pubid idtype="pmpid" link="fulltext">12202761</pubid></pubidlist></xrefbib></bibl><bibl id="B26"><title><p>Computational analysis and experimental validation of tumor-associated alternative RNA splicing in human cancer</p></title><aug><au><snm>Wang</snm><fnm>Z</fnm></au><au><snm>Lo</snm><fnm>HS</fnm></au><au><snm>Yang</snm><fnm>H</fnm></au><au><snm>Gere</snm><fnm>S</fnm></au><au><snm>Hu</snm><fnm>Y</fnm></au><au><snm>Buetow</snm><fnm>KH</fnm></au><au><snm>Lee</snm><fnm>MP</fnm></au></aug><source>Cancer Res</source><pubdate>2003</pubdate><volume>63</volume><issue>3</issue><fpage>655</fpage><lpage>657</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">12566310</pubid></xrefbib></bibl><bibl id="B27"><title><p>Discovery of novel splice forms and functional analysis of cancer-specific alternative splicing in human expressed sequences</p></title><aug><au><snm>Xu</snm><fnm>Q</fnm></au><au><snm>Lee</snm><fnm>C</fnm></au></aug><source>Nucleic acids research</source><pubdate>2003</pubdate><volume>31</volume><issue>19</issue><fpage>5635</fpage><lpage>5643</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkg786</pubid><pubid idtype="pmcid">206480</pubid><pubid idtype="pmpid" link="fulltext">14500827</pubid></pubidlist></xrefbib></bibl><bibl id="B28"><title><p>Identification of alternatively spliced mRNA variants related to cancers by genome-wide ESTs alignment</p></title><aug><au><snm>Hui</snm><fnm>L</fnm></au><au><snm>Zhang</snm><fnm>X</fnm></au><au><snm>Wu</snm><fnm>X</fnm></au><au><snm>Lin</snm><fnm>Z</fnm></au><au><snm>Wang</snm><fnm>Q</fnm></au><au><snm>Li</snm><fnm>Y</fnm></au><au><snm>Hu</snm><fnm>G</fnm></au></aug><source>Oncogene</source><pubdate>2004</pubdate><volume>23</volume><issue>17</issue><fpage>3013</fpage><lpage>3023</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/sj.onc.1207362</pubid><pubid idtype="pmpid" link="fulltext">15048092</pubid></pubidlist></xrefbib></bibl><bibl id="B29"><title><p>Identification of human exons overexpressed in tumors through the use of genome and expressed sequence data</p></title><aug><au><snm>Kirschbaum-Slager</snm><fnm>N</fnm></au><au><snm>Parmigiani</snm><fnm>RB</fnm></au><au><snm>Camargo</snm><fnm>AA</fnm></au><au><snm>de Souza</snm><fnm>SJ</fnm></au></aug><source>Physiol Genomics</source><pubdate>2005</pubdate><volume>21</volume><issue>3</issue><fpage>423</fpage><lpage>432</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1152/physiolgenomics.00237.2004</pubid><pubid idtype="pmpid" link="fulltext">15784694</pubid></pubidlist></xrefbib></bibl><bibl id="B30"><title><p>A global view of cancer-specific transcript variants by subtractive transcriptome-wide analysis</p></title><aug><au><snm>He</snm><fnm>C</fnm></au><au><snm>Zhou</snm><fnm>F</fnm></au><au><snm>Zuo</snm><fnm>Z</fnm></au><au><snm>Cheng</snm><fnm>H</fnm></au><au><snm>Zhou</snm><fnm>R</fnm></au></aug><source>PLoS One</source><pubdate>2009</pubdate><volume>4</volume><issue>3</issue><fpage>e4732</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pone.0004732</pubid><pubid idtype="pmcid">2648985</pubid><pubid idtype="pmpid" link="fulltext">19266097</pubid></pubidlist></xrefbib></bibl><bibl id="B31"><title><p>Identification of tumor-associated cassette exons in human cancer through EST-based computational prediction and experimental validation</p></title><aug><au><snm>Valletti</snm><fnm>A</fnm></au><au><snm>Anselmo</snm><fnm>A</fnm></au><au><snm>Mangiulli</snm><fnm>M</fnm></au><au><snm>Boria</snm><fnm>I</fnm></au><au><snm>Mignone</snm><fnm>F</fnm></au><au><snm>Merla</snm><fnm>G</fnm></au><au><snm>D'Angelo</snm><fnm>V</fnm></au><au><snm>Tullo</snm><fnm>A</fnm></au><au><snm>Sbisa</snm><fnm>E</fnm></au><au><snm>D'Erchia</snm><fnm>AM</fnm></au><etal/></aug><source>Mol Cancer</source><pubdate>2010</pubdate><volume>9</volume><fpage>230</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1476-4598-9-230</pubid><pubid idtype="pmcid">2941758</pubid><pubid idtype="pmpid" link="fulltext">20813049</pubid></pubidlist></xrefbib></bibl><bibl id="B32"><title><p>Simpson's paradox visualized: the example of the rosiglitazone meta-analysis</p></title><aug><au><snm>Rucker</snm><fnm>G</fnm></au><au><snm>Schumacher</snm><fnm>M</fnm></au></aug><source>BMC Med Res Methodol</source><pubdate>2008</pubdate><volume>8</volume><fpage>34</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2288-8-34</pubid><pubid idtype="pmcid">2438436</pubid><pubid idtype="pmpid" link="fulltext">18513392</pubid></pubidlist></xrefbib></bibl><bibl id="B33"><title><p>Statistical aspects of the analysis of data from retrospective studies of disease</p></title><aug><au><snm>Mantel</snm><fnm>N</fnm></au><au><snm>Haenszel</snm><fnm>W</fnm></au></aug><source>Journal of the National Cancer Institute</source><pubdate>1959</pubdate><volume>22</volume><issue>4</issue><fpage>719</fpage><lpage>748</lpage><xrefbib><pubid idtype="pmpid">13655060</pubid></xrefbib></bibl><bibl id="B34"><title><p>dbEST--database for "expressed sequence tags"</p></title><aug><au><snm>Boguski</snm><fnm>MS</fnm></au><au><snm>Lowe</snm><fnm>TM</fnm></au><au><snm>Tolstoshev</snm><fnm>CM</fnm></au></aug><source>Nature genetics</source><pubdate>1993</pubdate><volume>4</volume><issue>4</issue><fpage>332</fpage><lpage>333</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng0893-332</pubid><pubid idtype="pmpid" link="fulltext">8401577</pubid></pubidlist></xrefbib></bibl><bibl id="B35"><title><p>Shotgun sequencing of the human transcriptome with ORF expressed sequence tags</p></title><aug><au><snm>Dias Neto</snm><fnm>E</fnm></au><au><snm>Correa</snm><fnm>RG</fnm></au><au><snm>Verjovski-Almeida</snm><fnm>S</fnm></au><au><snm>Briones</snm><fnm>MR</fnm></au><au><snm>Nagai</snm><fnm>MA</fnm></au><au><snm>da Silva</snm><fnm>W</fnm><suf>Jr</suf></au><au><snm>Zago</snm><fnm>MA</fnm></au><au><snm>Bordin</snm><fnm>S</fnm></au><au><snm>Costa</snm><fnm>FF</fnm></au><au><snm>Goldman</snm><fnm>GH</fnm></au><etal/></aug><source>Proceedings of the National Academy of Sciences of the United States of America</source><pubdate>2000</pubdate><volume>97</volume><issue>7</issue><fpage>3491</fpage><lpage>3496</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.97.7.3491</pubid><pubid idtype="pmcid">16267</pubid><pubid idtype="pmpid" link="fulltext">10737800</pubid></pubidlist></xrefbib></bibl><bibl id="B36"><title><p>STRING 8--a global view on proteins and their functional interactions in 630 organisms</p></title><aug><au><snm>Jensen</snm><fnm>LJ</fnm></au><au><snm>Kuhn</snm><fnm>M</fnm></au><au><snm>Stark</snm><fnm>M</fnm></au><au><snm>Chaffron</snm><fnm>S</fnm></au><au><snm>Creevey</snm><fnm>C</fnm></au><au><snm>Muller</snm><fnm>J</fnm></au><au><snm>Doerks</snm><fnm>T</fnm></au><au><snm>Julien</snm><fnm>P</fnm></au><au><snm>Roth</snm><fnm>A</fnm></au><au><snm>Simonovic</snm><fnm>M</fnm></au><etal/></aug><source>Nucleic acids research</source><pubdate>2009</pubdate><issue>37 Database</issue><fpage>D412</fpage><lpage>416</lpage><xrefbib><pubidlist><pubid idtype="pmcid">2686466</pubid><pubid idtype="pmpid" link="fulltext">18940858</pubid></pubidlist></xrefbib></bibl><bibl id="B37"><title><p>NCBI Reference Sequences: current status, policy and new initiatives</p></title><aug><au><snm>Pruitt</snm><fnm>KD</fnm></au><au><snm>Tatusova</snm><fnm>T</fnm></au><au><snm>Klimke</snm><fnm>W</fnm></au><au><snm>Maglott</snm><fnm>DR</fnm></au></aug><source>Nucleic acids research</source><pubdate>2009</pubdate><issue>37 Database</issue><fpage>D32</fpage><lpage>36</lpage><xrefbib><pubidlist><pubid idtype="pmcid">2686572</pubid><pubid idtype="pmpid" link="fulltext">18927115</pubid></pubidlist></xrefbib></bibl><bibl id="B38"><title><p>NCBI RefSeq FTP</p></title><url>ftp://ftp.ncbi.nih.gov/refseq/release/vertebrate_mammalian</url></bibl><bibl id="B39"><title><p>RepeatMasker Open-3.0.1996-2010</p></title><url>http://www.repeatmasker.org</url></bibl><bibl id="B40"><title><p>CGAP download site</p></title><url>http://cgap.nci.nih.gov/Info/CGAPDownload</url></bibl><bibl id="B41"><title><p>BLAT--the BLAST-like alignment tool</p></title><aug><au><snm>Kent</snm><fnm>WJ</fnm></au></aug><source>Genome research</source><pubdate>2002</pubdate><volume>12</volume><issue>4</issue><fpage>656</fpage><lpage>664</lpage><xrefbib><pubidlist><pubid idtype="pmcid">187518</pubid><pubid idtype="pmpid" link="fulltext">11932250</pubid></pubidlist></xrefbib></bibl><bibl id="B42"><title><p>Gene Expression Omnibus: NCBI gene expression and hybridization array data repository</p></title><aug><au><snm>Edgar</snm><fnm>R</fnm></au><au><snm>Domrachev</snm><fnm>M</fnm></au><au><snm>Lash</snm><fnm>AE</fnm></au></aug><source>Nucleic acids research</source><pubdate>2002</pubdate><volume>30</volume><issue>1</issue><fpage>207</fpage><lpage>210</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/30.1.207</pubid><pubid idtype="pmcid">99122</pubid><pubid idtype="pmpid" link="fulltext">11752295</pubid></pubidlist></xrefbib></bibl><bibl id="B43"><title><p>Low Level Analysis of High-density Oligonucleotide Array Data: Background, Normalization and Summarization</p></title><aug><au><snm>Bolstad</snm><fnm>B</fnm></au></aug><publisher>University of California, Berkeley</publisher><pubdate>2004</pubdate></bibl><bibl id="B44"><title><p>affy--analysis of Affymetrix GeneChip data at the probe level</p></title><aug><au><snm>Gautier</snm><fnm>L</fnm></au><au><snm>Cope</snm><fnm>L</fnm></au><au><snm>Bolstad</snm><fnm>BM</fnm></au><au><snm>Irizarry</snm><fnm>RA</fnm></au></aug><source>Bioinformatics</source><pubdate>2004</pubdate><volume>20</volume><issue>3</issue><fpage>307</fpage><lpage>315</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btg405</pubid><pubid idtype="pmpid" link="fulltext">14960456</pubid></pubidlist></xrefbib></bibl><bibl id="B45"><title><p>GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor</p></title><aug><au><snm>Sean</snm><fnm>D</fnm></au><au><snm>Meltzer</snm><fnm>PS</fnm></au></aug><source>Bioinformatics</source><pubdate>2007</pubdate><volume>23</volume><issue>14</issue><fpage>1846</fpage><lpage>1847</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btm254</pubid><pubid idtype="pmpid" link="fulltext">17496320</pubid></pubidlist></xrefbib></bibl><bibl id="B46"><title><p>Linear models and empirical bayes methods for assessing differential expression in microarray experiments</p></title><aug><au><snm>Smyth</snm><fnm>GK</fnm></au></aug><source>Stat Appl Genet Mol Biol</source><pubdate>2004</pubdate><volume>3</volume><fpage>Article3</fpage><xrefbib><pubid idtype="pmpid" link="fulltext">16646809</pubid></xrefbib></bibl><bibl id="B47"><title><p>UniProt Knowledgebase: a hub of integrated protein data</p></title><aug><au><snm>Magrane</snm><fnm>M</fnm></au><au><snm>Consortium</snm><fnm>U</fnm></au></aug><source>Database (Oxford)</source><pubdate>2011</pubdate><note>bar009</note><xrefbib><pubidlist><pubid idtype="pmcid">3070428</pubid><pubid idtype="pmpid" link="fulltext">21447597</pubid></pubidlist></xrefbib></bibl><bibl id="B48"><title><p>SPD--a web-based secreted protein database</p></title><aug><au><snm>Chen</snm><fnm>Y</fnm></au><au><snm>Zhang</snm><fnm>Y</fnm></au><au><snm>Yin</snm><fnm>Y</fnm></au><au><snm>Gao</snm><fnm>G</fnm></au><au><snm>Li</snm><fnm>S</fnm></au><au><snm>Jiang</snm><fnm>Y</fnm></au><au><snm>Gu</snm><fnm>X</fnm></au><au><snm>Luo</snm><fnm>J</fnm></au></aug><source>Nucleic acids research</source><pubdate>2005</pubdate><issue>33 Database</issue><fpage>D169</fpage><lpage>173</lpage><xrefbib><pubidlist><pubid idtype="pmcid">540047</pubid><pubid idtype="pmpid" link="fulltext">15608170</pubid></pubidlist></xrefbib></bibl><bibl id="B49"><title><p>TOPDB: topology data bank of transmembrane proteins</p></title><aug><au><snm>Tusnady</snm><fnm>GE</fnm></au><au><snm>Kalmar</snm><fnm>L</fnm></au><au><snm>Simon</snm><fnm>I</fnm></au></aug><source>Nucleic acids research</source><pubdate>2008</pubdate><issue>36 Database</issue><fpage>D234</fpage><lpage>239</lpage><xrefbib><pubidlist><pubid idtype="pmcid">2238857</pubid><pubid idtype="pmpid" link="fulltext">17921502</pubid></pubidlist></xrefbib></bibl><bibl id="B50"><title><p>LOCATE: a mammalian protein subcellular localization database</p></title><aug><au><snm>Sprenger</snm><fnm>J</fnm></au><au><snm>Lynn Fink</snm><fnm>J</fnm></au><au><snm>Karunaratne</snm><fnm>S</fnm></au><au><snm>Hanson</snm><fnm>K</fnm></au><au><snm>Hamilton</snm><fnm>NA</fnm></au><au><snm>Teasdale</snm><fnm>RD</fnm></au></aug><source>Nucleic acids research</source><pubdate>2008</pubdate><issue>36 Database</issue><fpage>D230</fpage><lpage>233</lpage><xrefbib><pubidlist><pubid idtype="pmcid">2238969</pubid><pubid idtype="pmpid" link="fulltext">17986452</pubid></pubidlist></xrefbib></bibl><bibl id="B51"><title><p>Transmembrane proteins in the Protein Data Bank: identification and classification</p></title><aug><au><snm>Tusnady</snm><fnm>GE</fnm></au><au><snm>Dosztanyi</snm><fnm>Z</fnm></au><au><snm>Simon</snm><fnm>I</fnm></au></aug><source>Bioinformatics</source><pubdate>2004</pubdate><volume>20</volume><issue>17</issue><fpage>2964</fpage><lpage>2972</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/bth340</pubid><pubid idtype="pmpid" link="fulltext">15180935</pubid></pubidlist></xrefbib></bibl><bibl id="B52"><title><p>PDB_TM: selection and membrane localization of transmembrane proteins in the protein data bank</p></title><aug><au><snm>Tusnady</snm><fnm>GE</fnm></au><au><snm>Dosztanyi</snm><fnm>Z</fnm></au><au><snm>Simon</snm><fnm>I</fnm></au></aug><source>Nucleic acids research</source><pubdate>2005</pubdate><issue>33 Database</issue><fpage>D275</fpage><lpage>278</lpage><xrefbib><pubidlist><pubid idtype="pmcid">539956</pubid><pubid idtype="pmpid" link="fulltext">15608195</pubid></pubidlist></xrefbib></bibl><bibl id="B53"><title><p>OPM: orientations of proteins in membranes database</p></title><aug><au><snm>Lomize</snm><fnm>MA</fnm></au><au><snm>Lomize</snm><fnm>AL</fnm></au><au><snm>Pogozheva</snm><fnm>ID</fnm></au><au><snm>Mosberg</snm><fnm>HI</fnm></au></aug><source>Bioinformatics</source><pubdate>2006</pubdate><volume>22</volume><issue>5</issue><fpage>623</fpage><lpage>625</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btk023</pubid><pubid idtype="pmpid" link="fulltext">16397007</pubid></pubidlist></xrefbib></bibl><bibl id="B54"><title><p>The Membrane Protein Data Bank</p></title><aug><au><snm>Raman</snm><fnm>P</fnm></au><au><snm>Cherezov</snm><fnm>V</fnm></au><au><snm>Caffrey</snm><fnm>M</fnm></au></aug><source>Cell Mol Life Sci</source><pubdate>2006</pubdate><volume>63</volume><issue>1</issue><fpage>36</fpage><lpage>51</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1007/s00018-005-5350-6</pubid><pubid idtype="pmcid">2792347</pubid><pubid idtype="pmpid" link="fulltext">16314922</pubid></pubidlist></xrefbib></bibl><bibl id="B55"><title><p>GeneGo</p></title><url>http://www.genego.com/</url></bibl><bibl id="B56"><title><p>Cytoscape 2.8: new features for data integration and network visualization</p></title><aug><au><snm>Smoot</snm><fnm>ME</fnm></au><au><snm>Ono</snm><fnm>K</fnm></au><au><snm>Ruscheinski</snm><fnm>J</fnm></au><au><snm>Wang</snm><fnm>PL</fnm></au><au><snm>Ideker</snm><fnm>T</fnm></au></aug><source>Bioinformatics</source><pubdate>2011</pubdate><volume>27</volume><issue>3</issue><fpage>431</fpage><lpage>432</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btq675</pubid><pubid idtype="pmcid">3031041</pubid><pubid idtype="pmpid" link="fulltext">21149340</pubid></pubidlist></xrefbib></bibl><bibl id="B57"><title><p>On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling</p></title><aug><au><snm>Pearson</snm><fnm>K</fnm></au></aug><source>Philosophical Magazine, Series 5</source><pubdate>1900</pubdate><volume>50</volume><issue>302</issue><fpage>157</fpage><lpage>175</lpage><xrefbib><pubid idtype="doi">10.1080/14786440009463897</pubid></xrefbib></bibl><bibl id="B58"><title><p>Interpretation and estimation of summary ratios under heterogeneity</p></title><aug><au><snm>Greenland</snm><fnm>S</fnm></au></aug><source>Statistics in medicine</source><pubdate>1982</pubdate><volume>1</volume><issue>3</issue><fpage>217</fpage><lpage>227</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1002/sim.4780010304</pubid><pubid idtype="pmpid">7187095</pubid></pubidlist></xrefbib></bibl><bibl id="B59"><title><p>Handbook of Biological Statistics</p></title><aug><au><snm>McDonald</snm><fnm>JH</fnm></au></aug><publisher>Sparky House Publishing, Baltimore, Maryland</publisher><edition>2nd</edition><pubdate>2009</pubdate></bibl><bibl id="B60"><title><p>A general overview of Mantel-Haenszel methods: applications and recent developments</p></title><aug><au><snm>Kuritz</snm><fnm>SJ</fnm></au><au><snm>Landis</snm><fnm>JR</fnm></au><au><snm>Koch</snm><fnm>GG</fnm></au></aug><source>Annual review of public health</source><pubdate>1988</pubdate><volume>9</volume><fpage>123</fpage><lpage>160</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1146/annurev.pu.09.050188.001011</pubid><pubid idtype="pmpid" link="fulltext">3288229</pubid></pubidlist></xrefbib></bibl><bibl id="B61"><title><p>Candidate serological biomarkers for cancer identified from the secretomes of 23 cancer cell lines and the human protein atlas</p></title><aug><au><snm>Wu</snm><fnm>CC</fnm></au><au><snm>Hsu</snm><fnm>CW</fnm></au><au><snm>Chen</snm><fnm>CD</fnm></au><au><snm>Yu</snm><fnm>CJ</fnm></au><au><snm>Chang</snm><fnm>KP</fnm></au><au><snm>Tai</snm><fnm>DI</fnm></au><au><snm>Liu</snm><fnm>HP</fnm></au><au><snm>Su</snm><fnm>WH</fnm></au><au><snm>Chang</snm><fnm>YS</fnm></au><au><snm>Yu</snm><fnm>JS</fnm></au></aug><source>Molecular &amp; cellular proteomics : MCP</source><pubdate>2010</pubdate><volume>9</volume><issue>6</issue><fpage>1100</fpage><lpage>1117</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1074/mcp.M900398-MCP200</pubid><pubid idtype="pmcid">2877973</pubid><pubid idtype="pmpid" link="fulltext">20124221</pubid></pubidlist></xrefbib></bibl><bibl id="B62"><title><p>Claudin-18 splice variant 2 is a pan-cancer target suitable for therapeutic antibody development</p></title><aug><au><snm>Sahin</snm><fnm>U</fnm></au><au><snm>Koslowski</snm><fnm>M</fnm></au><au><snm>Dhaene</snm><fnm>K</fnm></au><au><snm>Usener</snm><fnm>D</fnm></au><au><snm>Brandenburg</snm><fnm>G</fnm></au><au><snm>Seitz</snm><fnm>G</fnm></au><au><snm>Huber</snm><fnm>C</fnm></au><au><snm>Tureci</snm><fnm>O</fnm></au></aug><source>Clinical cancer research : an official journal of the American Association for Cancer Research</source><pubdate>2008</pubdate><volume>14</volume><issue>23</issue><fpage>7624</fpage><lpage>7634</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1158/1078-0432.CCR-08-1547</pubid><pubid idtype="pmpid" link="fulltext">19047087</pubid></pubidlist></xrefbib></bibl><bibl id="B63"><title><p>Vascular gene expression patterns are conserved in primary and metastatic brain tumors</p></title><aug><au><snm>Liu</snm><fnm>Y</fnm></au><au><snm>Carson-Walter</snm><fnm>EB</fnm></au><au><snm>Cooper</snm><fnm>A</fnm></au><au><snm>Winans</snm><fnm>BN</fnm></au><au><snm>Johnson</snm><fnm>MD</fnm></au><au><snm>Walter</snm><fnm>KA</fnm></au></aug><source>Journal of neuro-oncology</source><pubdate>2010</pubdate><volume>99</volume><issue>1</issue><fpage>13</fpage><lpage>24</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1007/s11060-009-0105-0</pubid><pubid idtype="pmcid">2904485</pubid><pubid idtype="pmpid" link="fulltext">20063114</pubid></pubidlist></xrefbib></bibl><bibl id="B64"><title><p>Expression of collagen types I, II and III in juvenile angiofibromas</p></title><aug><au><snm>Gramann</snm><fnm>M</fnm></au><au><snm>Wendler</snm><fnm>O</fnm></au><au><snm>Haeberle</snm><fnm>L</fnm></au><au><snm>Schick</snm><fnm>B</fnm></au></aug><source>Cells, tissues, organs</source><pubdate>2009</pubdate><volume>189</volume><issue>6</issue><fpage>403</fpage><lpage>409</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1159/000158754</pubid><pubid idtype="pmpid" link="fulltext">18815441</pubid></pubidlist></xrefbib></bibl><bibl id="B65"><title><p>A novel oncoprotein RNF43 functions in an autocrine manner in colorectal cancer</p></title><aug><au><snm>Yagyu</snm><fnm>R</fnm></au><au><snm>Furukawa</snm><fnm>Y</fnm></au><au><snm>Lin</snm><fnm>YM</fnm></au><au><snm>Shimokawa</snm><fnm>T</fnm></au><au><snm>Yamamura</snm><fnm>T</fnm></au><au><snm>Nakamura</snm><fnm>Y</fnm></au></aug><source>International journal of oncology</source><pubdate>2004</pubdate><volume>25</volume><issue>5</issue><fpage>1343</fpage><lpage>1348</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">15492824</pubid></xrefbib></bibl><bibl id="B66"><title><p>A gene signature predictive for outcome in advanced ovarian cancer identifies a survival factor: microfibril-associated glycoprotein 2</p></title><aug><au><snm>Mok</snm><fnm>SC</fnm></au><au><snm>Bonome</snm><fnm>T</fnm></au><au><snm>Vathipadiekal</snm><fnm>V</fnm></au><au><snm>Bell</snm><fnm>A</fnm></au><au><snm>Johnson</snm><fnm>ME</fnm></au><au><snm>Wong</snm><fnm>KK</fnm></au><au><snm>Park</snm><fnm>DC</fnm></au><au><snm>Hao</snm><fnm>K</fnm></au><au><snm>Yip</snm><fnm>DK</fnm></au><au><snm>Donninger</snm><fnm>H</fnm></au><etal/></aug><source>Cancer cell</source><pubdate>2009</pubdate><volume>16</volume><issue>6</issue><fpage>521</fpage><lpage>532</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.ccr.2009.10.018</pubid><pubid idtype="pmcid">3008560</pubid><pubid idtype="pmpid" link="fulltext">19962670</pubid></pubidlist></xrefbib></bibl><bibl id="B67"><title><p>Gene expression profiling supports the hypothesis that human ovarian surface epithelia are multipotent and capable of serving as ovarian cancer initiating cells</p></title><aug><au><snm>Bowen</snm><fnm>NJ</fnm></au><au><snm>Walker</snm><fnm>LD</fnm></au><au><snm>Matyunina</snm><fnm>LV</fnm></au><au><snm>Logani</snm><fnm>S</fnm></au><au><snm>Totten</snm><fnm>KA</fnm></au><au><snm>Benigno</snm><fnm>BB</fnm></au><au><snm>McDonald</snm><fnm>JF</fnm></au></aug><source>BMC medical genomics</source><pubdate>2009</pubdate><volume>2</volume><fpage>71</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1755-8794-2-71</pubid><pubid idtype="pmcid">2806370</pubid><pubid idtype="pmpid" link="fulltext">20040092</pubid></pubidlist></xrefbib></bibl><bibl id="B68"><title><p>Combined gene expression analysis of whole-tissue and microdissected pancreatic ductal adenocarcinoma identifies genes specifically overexpressed in tumor epithelia</p></title><aug><au><snm>Badea</snm><fnm>L</fnm></au><au><snm>Herlea</snm><fnm>V</fnm></au><au><snm>Dima</snm><fnm>SO</fnm></au><au><snm>Dumitrascu</snm><fnm>T</fnm></au><au><snm>Popescu</snm><fnm>I</fnm></au></aug><source>Hepato-gastroenterology</source><pubdate>2008</pubdate><volume>55</volume><issue>88</issue><fpage>2016</fpage><lpage>2027</lpage><xrefbib><pubid idtype="pmpid">19260470</pubid></xrefbib></bibl></refgrp>
</bm></art>