<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-7-84</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Software</dochead>
      <bibl>
         <title>
            <p>The <it>PowerAtlas</it>: a power and sample size atlas for microarray experimental design and research</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Page</snm>
               <mi>P</mi>
               <fnm>Grier</fnm>
               <insr iid="I1"/>
               <email>gpage@uab.edu</email>
            </au>
            <au id="A2">
               <snm>Edwards</snm>
               <mi>W</mi>
               <fnm>Jode</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>jode@iatsate.edu</email>
            </au>
            <au id="A3">
               <snm>Gadbury</snm>
               <mi>L</mi>
               <fnm>Gary</fnm>
               <insr iid="I3"/>
               <email>gadburyg@umr.edu</email>
            </au>
            <au id="A4">
               <snm>Yelisetti</snm>
               <fnm>Prashanth</fnm>
               <insr iid="I1"/>
               <email>shanth1@uab.edu</email>
            </au>
            <au id="A5">
               <snm>Wang</snm>
               <fnm>Jelai</fnm>
               <insr iid="I1"/>
               <email>jwang@ms.soph.uab.edu</email>
            </au>
            <au id="A6">
               <snm>Trivedi</snm>
               <fnm>Prinal</fnm>
               <insr iid="I1"/>
               <email>ptrivedi@uab.edu</email>
            </au>
            <au id="A7">
               <snm>Allison</snm>
               <mi>B</mi>
               <fnm>David</fnm>
               <insr iid="I1"/>
               <email>dallison@uab.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham, AL, USA</p>
            </ins>
            <ins id="I2">
               <p>USDA ARS, Department of Agronomy, Iowa State University, Ames, IA, USA</p>
            </ins>
            <ins id="I3">
               <p>Department of Mathematics and Statistics, University of Missouri-Rolla, USA</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2006</pubdate>
         <volume>7</volume>
         <issue>1</issue>
         <fpage>84</fpage>
         <url>http://www.biomedcentral.com/1471-2105/7/84</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">16504070</pubid>
               <pubid idtype="doi">10.1186/1471-2105-7-84</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>26</day>
               <month>7</month>
               <year>2005</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>22</day>
               <month>2</month>
               <year>2006</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>22</day>
               <month>2</month>
               <year>2006</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2006</year>
         <collab>Page et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Microarrays permit biologists to simultaneously measure the mRNA abundance of thousands of genes. An important issue facing investigators planning microarray experiments is how to estimate the sample size required for good statistical power. What is the projected sample size or number of replicate chips needed to address the multiple hypotheses with acceptable accuracy? Statistical methods exist for calculating power based upon a single hypothesis, using estimates of the variability in data from pilot studies. There is, however, a need for methods to estimate power and/or required sample sizes in situations where multiple hypotheses are being tested, such as in microarray experiments. In addition, investigators frequently do not have pilot data to estimate the sample sizes required for microarray studies.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>To address this challenge, we have developed a Microrarray <it>PowerAtlas </it><abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. The atlas enables estimation of statistical power by allowing investigators to appropriately plan studies by building upon previous studies that have similar experimental characteristics. Currently, there are sample sizes and power estimates based on 632 experiments from Gene Expression Omnibus (GEO). The <it>PowerAtlas </it>also permits investigators to upload their own pilot data and derive power and sample size estimates from these data. This resource will be updated regularly with new datasets from GEO and other databases such as The Nottingham Arabidopsis Stock Center (NASC).</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>This resource provides a valuable tool for investigators who are planning efficient microarray studies and estimating required sample sizes.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="refman"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Planning microarray studies provides unique challenges to investigators with respect to estimating power and the sample size required for a study. The questions proposed may be quite general and exploratory, such as "which genes are differentially expressed in response to a given treatment?" A microarray study should have a high probability to answer, at least in part, the questions and hypotheses being proposed [loosely speaking, power or in our case Expected Discovery Rate (EDR)]. It should also have a high probability that those genes declared significant are truly differentially expressed (i.e. the 'True Positive' probability should be high). Sample size is a critical determinant of statistical power and expected error rates.</p>
         <p>In traditional biomedical studies, investigators test one or at most a few hypotheses. This is not the case in microarray studies. Each treatment or group comparison involves the testing of every gene on the chip, which may number in the 10,000's. Some microarray experiments may involve multiple groups; thus the total number of hypotheses tested in a microarray experiment can run in the 100,000 s or more. In addition, the effects size and variance for each hypothesis may be different; resulting in different power estimates for each and every gene by treatment comparison.</p>
         <p>Some investigators have proposed approaches to estimating required sample size for microarray research <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>, but most of these methods calculate power based upon an arbitrary level of change being biologically relevant and constant across all genes. These methods do not take into account the amount of variability in each gene nor specify a hypothesized distribution of effect sizes, and do not incorporate some of the recently developed approaches to account for multiple testing in high-dimensional biology(HDB) <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>.</p>
         <p>Calculating required sample sizes for a study requires an estimate of the variability in the dependent variable, in the case of microarray studies the genes' expression levels. This information is frequently derived from previous pilot studies performed by the research team or from similar data in the literature. Integrating information from pilot studies that illustrate the variability of all the genes in a study provides empirically driven and theoretically defensible sample size estimates. We have formalized such an approach <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Rather than using traditional power (1-&#946;) we introduced the concept of <it>Expected Discovery Rate </it>(EDR). EDR is the average power (see table <tblr tid="T1">1</tblr> for a definition of A, B, C, and D) for all genes for which the null hypothesis is false in an experiment. EDR is the E [Q] where Q = D/(D+B) if D+B>0 and Q = 0 otherwise and can be interpreted as the expected proportion of genes that are truly differentially expressed that will be declared to be differentially expressed. Microarray studies are affected by multiple testing issues; thus, when considering power studies one must not only consider the alpha (significance) level cut-off used, but also the expected proportion of genes that are True Positives (PTP), which is similar in concept to the False Discovery Rate. The PTP may be defined as (again A, B, C and D are defined in table <tblr tid="T1">1</tblr>) the E [R] where R = D/(C+D) if C+D>0 and R = 0 otherwise. The PTP is the expected proportion of genes that are declared significantly differentially expressed between the two samples that are actually differentially expressed between the two populations. A higher value for PTP is considered more desirable. The software also provides the <it>Probability of a True Negative </it>(PTN), which is the expected proportion of genes that are <it>not </it>declared significantly differentially expressed between the two samples that are actually not differentially expressed between the two populations. The use of EDR, PTP, and &#945; provides a coherent way to estimate the sample size required for microarray research that is consistent with current approaches to analyzing microarray data and conceptualizing the process of massive multiple hypothesis testing in HDB research <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. The <it>PowerAtlas </it>implements the methods developed by Gadbury et al <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> and adds further functionality.</p>
         <tbl id="T1">
            <title>
               <p>Table 1</p>
            </title>
            <caption>
               <p>Quantities of interest in microarray experiments</p>
            </caption>
            <tblbdy cols="3">
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Genes for which there is not a real effect</p>
                  </c>
                  <c ca="left">
                     <p>Genes for which there is not a real effect</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Genes not declared significant at designated thereshold</p>
                  </c>
                  <c ca="left">
                     <p>A</p>
                  </c>
                  <c ca="left">
                     <p>B</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Genes declared significant at designated thereshold</p>
                  </c>
                  <c ca="left">
                     <p>C</p>
                  </c>
                  <c ca="left">
                     <p>D</p>
                  </c>
               </r>
            </tblbdy>
         </tbl>
         <p>The <it>PowerAtlas </it>works in two ways. Firstly, investigators may upload their own pilot data and extrapolate out the EDR, PTN, and PTP for a variety of sample sizes and &#945; (type 1 error rate) level combinations. Secondly, many investigators do not have the opportunity to conduct their own pilot microarray study, but this need not stop an investigator. Given that many journals now require authors to place microarray data in public databases <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> before publication, investigators may draw upon these public data as pilot information. We have developed the <it>PowerAtlas </it>to assist investigators in the use of these public data to estimate the sample sizes required for well-powered studies. We have downloaded all data from Gene Expression Omnibus <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, reanalyzed it, and processed it with the methods developed in Gadbury et al <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Thereafter, we have put the power and sample size calculations for many of the datasets into a readily accessible and searchable database <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. It should be stressed that no one study is a perfect replicate of the study an investigator wishes to conduct, but similar studies can give a sense of the plausible ranges of sample sizes. We recommend that investigators examine several related experiments to get a sense of the sample size required for robust EDR and high PTP.</p>
         <p>Designing a microarray study with the appropriate number of replicates is cost efficient. The use of the <it>PowerAtlas </it>will not only prevent investigators from using too many samples in a group, resulting in wasted money; but will also limit wasting money on experiments that have too few replicates to have sufficient power to yield good results.</p>
         <sec>
            <st>
               <p>Usage of the PowerAtlas</p>
            </st>
            <p>No registration is required to use the <it>PowerAtlas</it>, nor are any programs or applets pushed to an investigator's computer. An investigator simply accesses the <it>PowerAtlas </it><abbrgrp><abbr bid="B1">1</abbr></abbrgrp> and selects the appropriate link to use public data or the investigator's study-specific data.</p>
            <p>The data in the <it>PowerAtlas </it>are taken directly from GEO. As long as it meets the requirements outlined in 'Using Existing Public Data' the data is included in the <it>PowerAtlas</it>. However, the data in GEO can be quite variable, due to any number of reasons, including, but not limited to, the image processing algorithm, normalization, and inferential statistical procedure used in the analysis. Thus when using public data as a basis for planning future studies, an investigator should consider the results from several datasets, consult the primary sources (GEO GDS files and journal publications of the data), and have a reasonable understanding of the idiosyncrasies and applicability of each dataset to the proposed experiment before using the data. In addition, since each lab processes and handles samples and runs microarrays slightly differently, when possible, estimates of power should be based upon an investigator's own pilot data, which will be more accurate for an investigator's future experimental power than will extrapolations from other investigators' data.</p>
         </sec>
         <sec>
            <st>
               <p>Using the investigators' own data</p>
            </st>
            <p>To use the <it>PowerAtlas </it>with an investigator's own data a list of p-values generated using a valid statistical method must be available. Currently the <it>PowerAtlas </it>generates sample sizes for two group comparisons only for any valid statistical test <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. Then use the following instructions:</p>
            <p>&#8226; The investigator must possess/generate a tab delimited file with one p-value per gene/feature for the main hypothesis of interest with each p-value located on its own line. There should be no identifiers for genes. All p-values from all genes on a chip/array should be included.</p>
            <p>&#8226; The file with p-values is uploaded to the web site.</p>
            <p>&#8226; The investigator then enters the sample sizes (N1 and N2) for each of the groups used to calculate the p-values.</p>
            <p>&#8226; The investigator then may either use default or custom settings for the sample sizes, significance (&#945;) thresholds, and number of iterations for the bootstrap to be used for estimating power.</p>
            <p>&#8226; The investigator selects submit. For a sense of runtime, from an initial set of 12,500 p-values with EDR, PTP, and PTN being calculated for 14 sample sizes and six thresholds, the analysis takes 3&#8211;10 minutes.</p>
            <p>&#8226; The investigator then will obtain a series of figures that illustrate the EDR, PTP, and PTN for a variety of sample sizes and significance (&#945;) thresholds(examples are shown in figures <figr fid="F1">1</figr>, <figr fid="F2">2</figr>, <figr fid="F3">3</figr> for an Affymetrix dataset <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> and figures <figr fid="F4">4</figr>, <figr fid="F5">5</figr>, <figr fid="F6">6</figr> for a cDNA experiment <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>). The investigator may then choose the sample size and &#945; level combination that achieves the desired levels for EDR, PTP, and PTN.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Estimated PTP, PTN, and EDR for the GDS486 [17] dataset for a variety of samples sizes at an alpha level of 0.05</p>
               </caption>
               <text>
                  <p>Estimated PTP, PTN, and EDR for the GDS486 [17] dataset for a variety of samples sizes at an alpha level of 0.05.</p>
               </text>
               <graphic file="1471-2105-7-84-1"/>
            </fig>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>For the GDS486 dataset [18] the EDR is presented across a variety of sample sizes and alpha levels</p>
               </caption>
               <text>
                  <p>For the GDS486 dataset [18] the EDR is presented across a variety of sample sizes and alpha levels.</p>
               </text>
               <graphic file="1471-2105-7-84-2"/>
            </fig>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>For the GDS486 dataset [19] the PTP is presented across a variety of sample sizes and alpha levels</p>
               </caption>
               <text>
                  <p>For the GDS486 dataset [19] the PTP is presented across a variety of sample sizes and alpha levels.</p>
               </text>
               <graphic file="1471-2105-7-84-3"/>
            </fig>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Estimated PTP, PTN, and EDR for the GDS75 [20] dataset for a variety of samples sizes at an alpha level of 0.05</p>
               </caption>
               <text>
                  <p>Estimated PTP, PTN, and EDR for the GDS75 [20] dataset for a variety of samples sizes at an alpha level of 0.05.</p>
               </text>
               <graphic file="1471-2105-7-84-4"/>
            </fig>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>For the GDS75 dataset [21] the EDR is presented across a variety of sample sizes and alpha levels</p>
               </caption>
               <text>
                  <p>For the GDS75 dataset [21] the EDR is presented across a variety of sample sizes and alpha levels.</p>
               </text>
               <graphic file="1471-2105-7-84-5"/>
            </fig>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>For the GDS75 dataset [22] the PTP is presented across a variety of sample sizes and alpha levels</p>
               </caption>
               <text>
                  <p>For the GDS75 dataset [22] the PTP is presented across a variety of sample sizes and alpha levels.</p>
               </text>
               <graphic file="1471-2105-7-84-6"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Using existing public data</p>
            </st>
            <p>We have downloaded and processed all data that was in GEO's ftp site as of Oct 1, 2004 into the <it>PowerAtlas</it>. One type of information GEO allows to be entered is the group to which a chip belongs. We conducted a pooled variance t-test on all possible two-group intraGDS (GDS is the GEO definition of an experiment) comparisons from all datasets within GEO. During analysis some datasets were removed. Reasons for removal include: A) The data per chip was incomplete, for example, due to masking or selecting genes based on present or absent calls. B) The p-values did not follow the expected possible distributions (monotonically non-decreasing from 0 to 1). Figure <figr fid="F7">7</figr> and <figr fid="F8">8</figr> illustrate a null data set (no more genes are significant than are expected at random). Figure <figr fid="F9">9</figr> illustrates a typical p-value distribution for a good dataset for which the null is false for some, but not all genes. Figures <figr fid="F10">10</figr>, <figr fid="F11">11</figr>, <figr fid="F12">12</figr> represent distributions of p-values seen while processing GEO that do not fit the expected distributions and thus would be listed as NA. C) Only two group fully randomized cDNA experiments can be analyzed, which prevents some cDNA experiments, including all loop designs and those that involve dye swaps unless in a balanced block design from being analyzed. <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>. D) Datasets having fewer than two experimental groups and three chips per group are also excluded.</p>
            <fig id="F7">
               <title>
                  <p>Figure 7</p>
               </title>
               <caption>
                  <p>Idealized representation of the distribution of p-values under the null hypothesis (no difference in gene expression between the two groups) for a valid test</p>
               </caption>
               <text>
                  <p>Idealized representation of the distribution of p-values under the null hypothesis (no difference in gene expression between the two groups) for a valid test. The dotted Blue line is the expected distribution of p-values if the treatment has no effect and the solid Green line is the mixed-model [23] fit of the p-value constrained to be monotonically non-decreasing.</p>
               </text>
               <graphic file="1471-2105-7-84-7"/>
            </fig>
            <fig id="F8">
               <title>
                  <p>Figure 8</p>
               </title>
               <caption>
                  <p>More realistic representation of the distribution of p-values under the null hypothesis (no difference in gene expression between the two groups) for a valid test</p>
               </caption>
               <text>
                  <p>More realistic representation of the distribution of p-values under the null hypothesis (no difference in gene expression between the two groups) for a valid test. The dotted Blue line is the expected distribution of p-values if the treatment has no effect and the solid Green line is the mixed-model [23] fit of the p-value constrained to be monotonically non-decreasing.</p>
               </text>
               <graphic file="1471-2105-7-84-8"/>
            </fig>
            <fig id="F9">
               <title>
                  <p>Figure 9</p>
               </title>
               <caption>
                  <p>Distribution of p-values when there is a difference in the gene expression between the two groups for some of the genes, but not all of the genes</p>
               </caption>
               <text>
                  <p>Distribution of p-values when there is a difference in the gene expression between the two groups for some of the genes, but not all of the genes. This distribution is monotonically non-increasing from 0 to 1. The dotted Blue line is the expected distribution of p-values if the treatment has no effect and the solid Green line is the mixed-model [23] fit of the p-value constrained to be monotonically non-decreasing.</p>
               </text>
               <graphic file="1471-2105-7-84-9"/>
            </fig>
            <fig id="F10">
               <title>
                  <p>Figure 10</p>
               </title>
               <caption>
                  <p>Distribution of p-values for a dataset in GEO that does not follow one of the possible distributions for p-values</p>
               </caption>
               <text>
                  <p>Distribution of p-values for a dataset in GEO that does not follow one of the possible distributions for p-values The dotted Blue line is the expected distribution of p-values if the treatment has no effect and the solid Green line is the mixed-model [23] fit of the p-value constrained to be monotonically non-decreasing.</p>
               </text>
               <graphic file="1471-2105-7-84-10"/>
            </fig>
            <fig id="F11">
               <title>
                  <p>Figure 11</p>
               </title>
               <caption>
                  <p>Distribution of p-values for a dataset in GEO that does not follow one of the possible distributions for p-values</p>
               </caption>
               <text>
                  <p>Distribution of p-values for a dataset in GEO that does not follow one of the possible distributions for p-values. The dotted Blue line is the expected distribution of p-values if the treatment has no effect and the solid Green line is the mixed-model [23] fit of the p-value constrained to be monotonically non-decreasing. The dotted Blue line is the expected distribution of p-values if the treatment has no effect and the solid Green line is the mixed-model [23] fit of the p-value constrained to be monotonically non-decreasing.</p>
               </text>
               <graphic file="1471-2105-7-84-11"/>
            </fig>
            <fig id="F12">
               <title>
                  <p>Figure 12</p>
               </title>
               <caption>
                  <p>Distribution of p-values for a dataset in GEO that does not follow one of the possible distributions for p-values</p>
               </caption>
               <text>
                  <p>Distribution of p-values for a dataset in GEO that does not follow one of the possible distributions for p-values. The dotted Blue line is the expected distribution of p-values if the treatment has no effect and the solid Green line is the mixed-model [23] fit of the p-value constrained to be monotonically non-decreasing.</p>
               </text>
               <graphic file="1471-2105-7-84-12"/>
            </fig>
            <p>To use the public data for estimating sample size:</p>
            <p>&#8226; From the <it>PowerAtlas </it>web page an investigator selects the option of using existing public data.</p>
            <p>&#8226; The investigator makes a selection of desired chip type (one or two channel) and the species of interest. At most, one of each chip type or species of interest may be selected. Alternatively, the investigator may also select only a chip type or a species.</p>
            <p>&#8226; A list of all experiments will appear that meet the selection criteria. The number of datasets can range from 0 (most bacteria on single channel chips) to more 200 for Human and Mouse on single channel chips (see table <tblr tid="T2">2</tblr>).</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Itemization of the number type and species of chips available. NA means power estimation is not available. See section "Using existing public data" for possible explanations why the datasets may have be listed as NA.</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Dual Channel</p>
                     </c>
                     <c ca="left">
                        <p>Dual Channel</p>
                     </c>
                     <c ca="left">
                        <p>Single Channel</p>
                     </c>
                     <c ca="left">
                        <p>Single Channel</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Available</p>
                     </c>
                     <c ca="left">
                        <p>NA</p>
                     </c>
                     <c ca="left">
                        <p>Available</p>
                     </c>
                     <c ca="left">
                        <p>NA</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Arabidopsis thaliana</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>10</p>
                     </c>
                     <c ca="left">
                        <p>22</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Aspergillus parasiticus</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Bacillus anthracis</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Bos taurus</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Caenorhabditis elegans</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Campylobacter jejuni</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Canis familiaris</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Capra hircus</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Chlamydomonas reinhardtii</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Cricetulus griseus</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Drosophila melanogaster</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>9</p>
                     </c>
                     <c ca="left">
                        <p>15</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Drosophila simulans</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Drosophila yakuba</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Escherichia coli</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Escherichia coli K12</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Fundulus heteroclitus</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Homo sapiens</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>44</p>
                     </c>
                     <c ca="left">
                        <p>34</p>
                     </c>
                     <c ca="left">
                        <p>178</p>
                     </c>
                     <c ca="left">
                        <p>35</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Marmota monax</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Mastomys natalensis</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Mus musculus</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>63</p>
                     </c>
                     <c ca="left">
                        <p>14</p>
                     </c>
                     <c ca="left">
                        <p>175</p>
                     </c>
                     <c ca="left">
                        <p>58</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Mycobacterium tuberculosis</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Oncorhynchus mykiss</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Oryza sativa</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Pinus contorta</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Rattus norvegicus</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>6</p>
                     </c>
                     <c ca="left">
                        <p>8</p>
                     </c>
                     <c ca="left">
                        <p>68</p>
                     </c>
                     <c ca="left">
                        <p>13</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Rhodobacter sphaeroides</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Saccharomyces cerevisiae</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>12</p>
                     </c>
                     <c ca="left">
                        <p>41</p>
                     </c>
                     <c ca="left">
                        <p>18</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Saccharomyces pastorianus</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Saccharum sp</it>.</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Salmo salar</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Salmonella enterica</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Sus scrofa</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Viruses</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Zea mays</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>TOTAL</p>
                     </c>
                     <c ca="left">
                        <p>142</p>
                     </c>
                     <c ca="left">
                        <p>138</p>
                     </c>
                     <c ca="left">
                        <p>490</p>
                     </c>
                     <c ca="left">
                        <p>115</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>&#8226; The investigator can read a brief description, taken directly from GEO, of all the experiments and find those that are most similar to their proposed experiment(s).</p>
            <p>&#8226; The investigator then selects the checkbox to the left of the desired datasets and press Submit to get additional information.</p>
            <p>&#8226; The investigator receives a report with a link to the GEO description should additional information be needed.</p>
            <p>&#8226; There are also links to a printable HTML report that includes a description of the dataset, the EDR, PTP, and PTN (figures <figr fid="F1">1</figr> and <figr fid="F4">4</figr>) at an &#945; level of 0.05 as well as a description of how to interpret the results.</p>
            <p>&#8226; In datasets with more than 2 groups the two groups with the largest sample sizes are given in the HTML report. There is a link to jpeg images for the other IntraGDS comparisons in the data set. In addition there is a link to a downloadable zip file that contains graphs illustrating the EDR (figures <figr fid="F2">2</figr> and <figr fid="F5">5</figr>), PTP (figures <figr fid="F3">3</figr> and <figr fid="F6">6</figr>), and PTN for a variety of &#945; and sample sizes in a directory structure for each two group comparison. There is also an Excel file provided that contains the numbers underlying the figures.</p>
         </sec>
         <sec>
            <st>
               <p>Illustrative example of the accuracy and utility of the PowerAtlas</p>
            </st>
            <p>We provide a concrete example for illustrative purposes. In one study (unpublished) the RNA kidneys from individual mice that were homozygous for a PKDPH mutation were collected. Mice were selected from a F2 cross that had a very high kidney length to width ratio (three mice) or a very low length to width ratio (three mice). This measure is one determinant of polycystic kidney disease severity. The RNA was run on the Affymetrix Mu74Av2 array and processed with MAS 5.0 (Affymetrix, Inc, Emoryville, CA). Figure <figr fid="F13">13</figr> illustrates the distribution of p-values for the 2-group comparison of the gene expression levels between the high and low kidney length-to-width ratio mice. These p-values were run through the <it>PowerAtlas</it>. We selected a target sample size of seven per group to have an EDR of > 40% and PTP > 80% at &#945;&lt;0.05. An additional seven mice were run from each extreme of the kidney length-to-width ratio distribution. The distribution of the p-values for the seven mice per group comparison is given in figure <figr fid="F14">14</figr>. Table <tblr tid="T3">3</tblr> illustrates the EDR and PTP at &#945; = 0.05 and 0.0001 for a sample size of seven per group that were estimated from the initial sample of three per group (row 2). These numbers are compared to the actual EDR and PTP that were calculated at &#945; = 0.05 and 0.0001 for the follow on study at a sample size of seven per group (row 3). The numbers are remarkably similar.</p>
            <fig id="F13">
               <title>
                  <p>Figure 13</p>
               </title>
               <caption>
                  <p>Distribution of p-values for the comparison of RNA from 3 murine with homozygous PKDH mutations and a high kidney length-to-width ratio and RNA from 3 mice with homozygous PKDH mutations and a low kidney length-to-width ratios</p>
               </caption>
               <text>
                  <p>Distribution of p-values for the comparison of RNA from 3 murine with homozygous PKDH mutations and a high kidney length-to-width ratio and RNA from 3 mice with homozygous PKDH mutations and a low kidney length-to-width ratios. The dotted Blue line is the expected distribution of p-values if the treatment has no effect and the solid Green line is the mixed-model [23] fit of the p-value constrained to be monotonically non-decreasing.</p>
               </text>
               <graphic file="1471-2105-7-84-13"/>
            </fig>
            <fig id="F14">
               <title>
                  <p>Figure 14</p>
               </title>
               <caption>
                  <p>Distribution of p-values for the comparison of RNA from 7 mice with PKDH mutations and a high kidney length to width ratio and RNA from 7 mice with PKDH mutations and a low kidney length to width ratio</p>
               </caption>
               <text>
                  <p>Distribution of p-values for the comparison of RNA from 7 mice with PKDH mutations and a high kidney length to width ratio and RNA from 7 mice with PKDH mutations and a low kidney length to width ratio. The dotted blue line is the expected distribution of p-values if the treatment has no effect and the solid green line is the mixed-model [23] fit of the p-value constrained to be monotonically non-decreasing.</p>
               </text>
               <graphic file="1471-2105-7-84-14"/>
            </fig>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Estimated EDR and PTP for sample size of 7 per group at alpha levels of 0.05 and 0.001 extrapolated from a sample size of 3 (row 2) from the PKD data and the estimated EDR and PTP group at alpha levels of 0.05 and 0.001 calculated in the follow up study of 7 mice per group.</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Estimated EDR for SS 7</b>
                        </p>
                        <p>
                           <b> at &#945; = 0.05</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Estimated PTP for SS 7</b>
                        </p>
                        <p>
                           <b> at &#945; = 0.05</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Estimated EDR for SS 7</b>
                        </p>
                        <p>
                           <b> at &#945; = 0.001</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Estimated PTP for SS 7</b>
                        </p>
                        <p>
                           <b> at &#945; = 0.001</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Pilot of 3 per group</p>
                     </c>
                     <c ca="left">
                        <p>0.415009</p>
                     </c>
                     <c ca="left">
                        <p>0.809616</p>
                     </c>
                     <c ca="left">
                        <p>0.119791</p>
                     </c>
                     <c ca="left">
                        <p>0.985287</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Experiment of 7 per group</p>
                     </c>
                     <c ca="left">
                        <p>0.538771</p>
                     </c>
                     <c ca="left">
                        <p>0.772419</p>
                     </c>
                     <c ca="left">
                        <p>0.133711</p>
                     </c>
                     <c ca="left">
                        <p>0.976347</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>The <it>PowerAtlas </it>provides investigators the option of using their own pilot data or drawing from a public domain microarray data sets to calculate sample sizes and statistical power for a proposed study. The overall goal is to estimate the sample size required to be able to answer the hypothesis of interest with a high EDR and a high PTP without using too many chips. Once the graphs and tables most appropriate are identified (which may involve examining several datasets), the investigator must decide upon the sample size to pursue. Unlike single hypothesis-driven research, a huge number of genes often are typically differentially expressed in a single microarray experiment and a study may yield many (often thousands) of significant genes. It is generally difficult for a single laboratory to follow-up or to investigate more than a few genes. Thus, while an EDR of 80% or more may be in line with traditional power studies, investigators may not want or have the laboratory resources to deal with large-scale high-powered gene expression experiments where 1000 s of genes are identified as differentially expressed. Thus, it may be more appropriate to have a small list of genes in which an investigator has high confidence that the genes identified as differentially expressed are truly differentially expressed. Thus, modest EDRs (10&#8211;40%) may be appropriate when conservative alphas are chosen to generate high PTPs (80%+). On the other hand, when the investigator wants to get a complete picture of the experimental manipulations it may be more appropriate to use a liberal alpha level (0.1) to have a high EDR, but this will yield a lower PTP. Investigators should carefully consider what error rate (the proportion of the genes that are studied further that are false positives) is acceptable, how many genes they can truly invest in studying, and how important it is to have a complete list of differentially expressed genes.</p>
         <p>A few other issues should be considered when choosing the sample size for an experiment. The first, one should not rely upon any one study to drive the sample size. An investigator should view several datasets to get an idea of the range of possible sample sizes. Secondly, we have analyzed all the data in the <it>PowerAtlas </it>as if it were two groups with fully randomized designs. This may not be the case; experiments may be 2 or 3 way experiment with multiple levels. If the actual experiment were these designs, the calculated sample size may be an over estimate for the methods in the <it>PowerAtlas </it>does not yet allow for using information from other groups to estimate the variances as methods such as ANOVA and linear models do. In addition the hypothesis shown in the main graph may not be the primary hypothesis of interest in a study, they are simply the groups with the largest sample sizes, and the other hypotheses should be reviewed as well. Investigators should review the primary literature to verify what the true experimental design was. We also assume the experiments were conducted in a rigorous fashion and have not been confounded by non-biological sources of error, which may adversely effect power nor does the use of good sample size obviate the need for good experimental design and conduct of the experiment <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>.</p>
         <sec>
            <st>
               <p>Future directions</p>
            </st>
            <p>There are several areas where the functions of the <it>PowerAtlas </it>will be expanded. First we will augment the data in the database by revising the data from GEO every six months and we will add data from additional sources such as the Nottingham Arabidopsis Stock Center (NASC). In the PTP graph at low n and small &#945;, the lines of the PTP sometimes cross, which is due to the fact under these conditions sometimes very few or even zero genes are declared significant. As this is the denominator of PTP the PTP is 0. Until the sample size gets large enough to declare enough genes differentially expressed at a chosen (small) threshold, PTP lines may cross over each other. We are working to eliminate this issue from our method. Currently, only point estimates of the EDR, PTP and PTN are generated. Future work will generate confidence intervals on these estimates. We are also extending the power estimation procedures to handle ANOVA and linear models, which will allow for power estimation for loops designs, the correct analysis of datasets with multiple groups, and time series data. When these methods have been developed they will be incorporated into the <it>PowerAtlas</it>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Availability and requirements</p>
         </st>
         <p>&#8226; <b>Project name: </b>The PowerAtlas</p>
         <p>&#8226; <b>Project home page: </b>http://www.powerAtlas.org also http://www.poweratlas.net </p>
         <p>&#8226; <b>Operating system(s): </b>Web-based application</p>
         <p>&#8226; <b>Programming language: </b>Java</p>
         <p>&#8226; <b>Other requirements: </b>Web browser. Unzip utility</p>
         <p>&#8226; <b>License: </b>None</p>
         <p>&#8226; <b>Any restrictions to use by non-academics: </b>None</p>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>ANOVA: Analysis of Variance</p>
         <p>EDR: Expected discovery rate</p>
         <p>GDS: Gene Expression Omnibus Dataset</p>
         <p>GEO: Gene Expression Omnibus</p>
         <p>HDB: High Dimensional Biology</p>
         <p>PTP: Probability of a True Positive</p>
         <p>PTN: Probability of a True Negative</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>GPP co-developed the concept of the <it>PowerAtlas</it>, co-developed the underlying statistics for the <it>PowerAtlas</it>, led the development of the <it>PowerAtlas </it>tool, and drafted the manuscript, JWE co-developed the concept of the <it>PowerAtlas</it>, DBA co-developed the concept of the <it>PowerAtlas </it>and co-developed the underlying statistics for the <it>PowerAtlas</it>. GLG Co-developed the underlying statistics for the <it>PowerAtlas</it>. PY, JW, and PT are software and database developers of the <it>PowerAtlas </it>tool and web-site.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We acknowledge the contributions of the many investigators who have selflessly deposited their microarray data into public databases such as GEO, without which the PowerAtlas would not be possible. This work was supported by NSF grant 0217651, 0306596 and NIH grant P50AT00477.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>http://www.poweratlas.org</p>
            </title>
            <aug>
               <au>
                  <snm>PowerAtlas</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <pubdate>2006</pubdate>
            <url>http://www.poweratlas.org</url>
         </bibl>
         <bibl id="B2">
            <title>
               <p>How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach</p>
            </title>
            <aug>
               <au>
                  <snm>Pan</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Le</snm>
                  <fnm>CT</fnm>
               </au>
            </aug>
            <source>Genome Biology</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>RESEARCH0022</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">115224</pubid>
                  <pubid idtype="pmpid" link="fulltext">12049663</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Power and sample size for DNA microarray studies</p>
            </title>
            <aug>
               <au>
                  <snm>Lee</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Whitmore</snm>
                  <fnm>GA</fnm>
               </au>
            </aug>
            <source>Stat Med</source>
            <pubdate>2002</pubdate>
            <volume>21</volume>
            <fpage>3543</fpage>
            <lpage>3570</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/sim.1335</pubid>
                  <pubid idtype="pmpid" link="fulltext">12436455</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Sample size for identifying differentially expressed genes in microarray experiments</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>JJ</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>2004</pubdate>
            <volume>11</volume>
            <fpage>714</fpage>
            <lpage>726</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/cmb.2004.11.714</pubid>
                  <pubid idtype="pmpid" link="fulltext">15579240</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Power Analysis and Sample Size Estimation in the Age of High Dimensional Biology</p>
            </title>
            <aug>
               <au>
                  <snm>Gadbury</snm>
                  <fnm>GL</fnm>
               </au>
               <au>
                  <snm>Page</snm>
                  <fnm>GP</fnm>
               </au>
               <au>
                  <snm>Edwards</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kayo</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Weindruch</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Permana</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Mountz</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Allison</snm>
                  <fnm>DB</fnm>
               </au>
            </aug>
            <source>Stat Meth Med Res</source>
            <pubdate>2004</pubdate>
            <volume>13</volume>
            <fpage>325</fpage>
            <lpage>338</lpage>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Towards sound epistemological foundations of statistical methods for high-dimensional biology</p>
            </title>
            <aug>
               <au>
                  <snm>Mehta</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Tanik</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Allison</snm>
                  <fnm>DB</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2004</pubdate>
            <volume>36</volume>
            <fpage>943</fpage>
            <lpage>947</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1422</pubid>
                  <pubid idtype="pmpid" link="fulltext">15340433</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Mathematical Challenges of the 21st Century - High-Dimensional Data Analysis: The Blessings and Curses of Dimensionality</p>
            </title>
            <aug>
               <au>
                  <snm>Donoho</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <pubdate>2000</pubdate>
            <url>http://www-stat.stanford.edu/~donoho/Lectures/AMS2000/MathChallengeSlides2*2.pdf</url>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Standards for microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Ball</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Sherlock</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Parkinson</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Rocca-Sera</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Brooksbank</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Causton</snm>
                  <fnm>HC</fnm>
               </au>
               <au>
                  <snm>Cavalieri</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Gaasterland</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Hingamp</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Holstege</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Ringwald</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Spellman</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Stoeckert</snm>
                  <fnm>CJJ</fnm>
               </au>
               <au>
                  <snm>Stewart</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Brazma</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Quackenbush</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2002</pubdate>
            <volume>298</volume>
            <fpage>539</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.298.5593.539b</pubid>
                  <pubid idtype="pmpid" link="fulltext">12387284</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Gene Expression Omnibus: NCBI gene expression and hybridization array data repository</p>
            </title>
            <aug>
               <au>
                  <snm>Edgar</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Domrachev</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lash</snm>
                  <fnm>AE</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>207</fpage>
            <lpage>210</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">99122</pubid>
                  <pubid idtype="pmpid" link="fulltext">11752295</pubid>
                  <pubid idtype="doi">10.1093/nar/30.1.207</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Improved statistical inference from DNA microarray data using analysis of variance and a Bayesian statistical framework. Analysis of global gene expression in Escherichia coli K12</p>
            </title>
            <aug>
               <au>
                  <snm>Long</snm>
                  <fnm>AD</fnm>
               </au>
               <au>
                  <snm>Mangalam</snm>
                  <fnm>HJ</fnm>
               </au>
               <au>
                  <snm>Chan</snm>
                  <fnm>BY</fnm>
               </au>
               <au>
                  <snm>Tolleri</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Hatfield</snm>
                  <fnm>GW</fnm>
               </au>
               <au>
                  <snm>Baldi</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>2001</pubdate>
            <volume>276</volume>
            <fpage>19937</fpage>
            <lpage>19944</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.M010192200</pubid>
                  <pubid idtype="pmpid" link="fulltext">11259426</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Novel tumor necrosis factor alpha-regulated genes in rheumatoid arthritis</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>HG</fnm>
               </au>
               <au>
                  <snm>Hyde</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Page</snm>
                  <fnm>GP</fnm>
               </au>
               <au>
                  <snm>Brand</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Allison</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Hsu</snm>
                  <fnm>HC</fnm>
               </au>
               <au>
                  <snm>Mountz</snm>
                  <fnm>JD</fnm>
               </au>
            </aug>
            <source>Arthritis Rheum</source>
            <pubdate>2004</pubdate>
            <volume>50</volume>
            <fpage>420</fpage>
            <lpage>431</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/art.20037</pubid>
                  <pubid idtype="pmpid" link="fulltext">14872484</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [see comments]</p>
            </title>
            <aug>
               <au>
                  <snm>Alizadeh</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>Ma</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lossos</snm>
                  <fnm>IS</fnm>
               </au>
               <au>
                  <snm>Rosenwald</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Boldrick</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Sabet</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Tran</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Powell</snm>
                  <fnm>JI</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Marti</snm>
                  <fnm>GE</fnm>
               </au>
               <au>
                  <snm>Moore</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Hudson</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Lewis</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Sherlock</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Chan</snm>
                  <fnm>WC</fnm>
               </au>
               <au>
                  <snm>Greiner</snm>
                  <fnm>TC</fnm>
               </au>
               <au>
                  <snm>Weisenburger</snm>
                  <fnm>DD</fnm>
               </au>
               <au>
                  <snm>Armitage</snm>
                  <fnm>JO</fnm>
               </au>
               <au>
                  <snm>Warnke</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Staudt</snm>
                  <fnm>LM</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>403</volume>
            <fpage>503</fpage>
            <lpage>511</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35000501</pubid>
                  <pubid idtype="pmpid" link="fulltext">10676951</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Analysis of variance for gene expression microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Kerr</snm>
                  <fnm>MK</fnm>
               </au>
               <au>
                  <snm>Martin</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Churchill</snm>
                  <fnm>GA</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>2000</pubdate>
            <volume>7</volume>
            <fpage>819</fpage>
            <lpage>837</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/10665270050514954</pubid>
                  <pubid idtype="pmpid" link="fulltext">11382364</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Statistical design and the analysis of gene expression microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Kerr</snm>
                  <fnm>MK</fnm>
               </au>
               <au>
                  <snm>Churchill</snm>
                  <fnm>GA</fnm>
               </au>
            </aug>
            <source>Genetic Research</source>
            <pubdate>2001</pubdate>
            <volume>77</volume>
            <fpage>123</fpage>
            <lpage>128</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1017/S0016672301005055</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Statistical isssues in cDNA microarray data analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Smyth</snm>
                  <fnm>GK</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>YH</fnm>
               </au>
               <au>
                  <snm>Speed</snm>
                  <fnm>TP</fnm>
               </au>
            </aug>
            <source>Function Genomics: Methods and protocols</source>
            <publisher>ToTowa, NJ, Humana Press</publisher>
            <editor>Borwnstein MJ and Khodursky A</editor>
            <edition>1</edition>
            <pubdate>2002</pubdate>
            <fpage>100</fpage>
            <lpage>106</lpage>
         </bibl>
         <bibl id="B16">
            <title>
               <p>A design and statistical perspective on microarray gene expression studies in nutrition: the need for playful creativity and scientific hard-mindedness</p>
            </title>
            <aug>
               <au>
                  <snm>Page</snm>
                  <fnm>GP</fnm>
               </au>
               <au>
                  <snm>Edwards</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Barnes</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Weindruch</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Allison</snm>
                  <fnm>DB</fnm>
               </au>
            </aug>
            <source>Nutrition</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>997</fpage>
            <lpage>1000</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.nut.2003.08.001</pubid>
                  <pubid idtype="pmpid" link="fulltext">14624952</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>A mixture model approach for the analysis of microarray gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Allison</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Gadbury</snm>
                  <fnm>GL</fnm>
               </au>
               <au>
                  <snm>Heo</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Fernandez</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>CK</fnm>
               </au>
               <au>
                  <snm>Prolla</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Weindruch</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Computational Statistics and Data Analysis</source>
            <pubdate>2002</pubdate>
            <volume>39</volume>
            <fpage>1</fpage>
            <lpage>20</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/S0167-9473(01)00046-9</pubid>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
