<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1742-5573-5-4</ui>
   <ji>1742-5573</ji>
   <fm>
      <dochead>Analytic Perspective</dochead>
      <bibl>
         <title>
            <p>Flexible Two-Phase studies for rare exposures: Feasibility, planning and efficiency issues of a new variant</p>
         </title>
         <aug>
            <au ca="yes" id="A1">
               <snm>Wild</snm>
               <fnm>Pascal</fnm>
               <insr iid="I1"/>
               <email>pascal.wild@inrs.fr</email>
            </au>
            <au id="A2">
               <snm>Andrieu</snm>
               <fnm>Nadine</fnm>
               <insr iid="I2"/>
               <insr iid="I3"/>
               <insr iid="I4"/>
               <email>nadine.andrieu@curie.net</email>
            </au>
            <au id="A3">
               <snm>Goldstein</snm>
               <mi>M</mi>
               <fnm>Alisa</fnm>
               <insr iid="I5"/>
               <email>goldstea@mail.nih.gov</email>
            </au>
            <au id="A4">
               <snm>Schill</snm>
               <fnm>Walter</fnm>
               <insr iid="I6"/>
               <email>schill@bips.uni-bremen.de</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>INRS, French National Institute for Research and Safety, Department of Epidemiology, France</p>
            </ins>
            <ins id="I2">
               <p>INSERM, U900, Paris, F-75248, France</p>
            </ins>
            <ins id="I3">
               <p>Institut Curie, Paris, F-75248, France</p>
            </ins>
            <ins id="I4">
               <p>Ecole des Mines de Paris, ParisTech, Fontainebleau, F-77300, France</p>
            </ins>
            <ins id="I5">
               <p>Genetic Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, USA</p>
            </ins>
            <ins id="I6">
               <p>BIPS, Bremen Institute for Prevention Research and Social Medicine, University of Bremen, Germany</p>
            </ins>
         </insg>
         <source>Epidemiologic Perspectives &amp; Innovations</source>
         <issn>1742-5573</issn>
         <pubdate>2008</pubdate>
         <volume>5</volume>
         <issue>1</issue>
         <fpage>4</fpage>
         <url>http://www.epi-perspectives.com/content/5/1/4</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18828892</pubid>
               <pubid idtype="doi">10.1186/1742-5573-5-4</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>28</day>
               <month>5</month>
               <year>2008</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>01</day>
               <month>10</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>01</day>
               <month>10</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Wild et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <p>The two-phase design consists of an initial (Phase One) study with known disease status and inexpensive covariate information. Within this initial study one selects a subsample on which to collect detailed covariate data. Two-phase studies have been shown to be efficient compared to standard case-control designs. However, potential problems arise if one cannot assure minimum sample sizes in the rarest categories or if recontact of subjects is difficult.</p>
            <p>In the case of a rare exposure with an inexpensive proxy, the authors propose the flexible two-phase design for which there is a single time of contact, at which a decision about full covariate ascertainment is made based on the proxy. Subjects are screened until the desired numbers of cases and controls have been selected for full data collection. Strategies for optimizing the cost/efficiency of this design and corresponding software are presented. The design is applied to two examples from occupational and genetic epidemiology. By ensuring minimum numbers for the rarest disease-covariate combination(s), we obtain considerable efficiency gains over standard two-phase studies with an improved practical feasibility.</p>
            <p>The flexible two-phase design may be the design of choice in the case of well targeted studies of the effect of rare exposures with an inexpensive proxy.</p>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Introduction</p>
         </st>
         <p>For rare exposures, the power of epidemiological studies depends mainly on the rarest disease-exposure combinations. For example, in population-based case-control studies the limiting factor is frequently the number of exposed cases and/or controls. One approach that may substantially increase power for these types of studies is the two-phase study design.</p>
         <p>The two-phase design <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp> consists of an initial (Phase One) large study with known disease status and easily collectible or inexpensive covariate information. Within this initial study one selects a subsample on which to collect detailed covariate data (Phase Two). In Phase Two, one may deliberately oversample the subjects with the rarest exposure-disease combinations based on the available Phase One information, consequently increasing power. Appropriate statistical methods <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> correct for the biased sampling by incorporating the statistical distribution of the available information among cases and controls from Phase One. The data collection of Phase Two usually proceeds in one of two ways. The first approach includes recontacting selected study subjects from Phase One to obtain detailed covariate information. However, with secondary data collection, potential problems may arise if recontacting subjects is difficult, if cases have died, or if response rates are low. Alternatively, one may collect full raw data at first contact for all participants and process only selected subjects. An example would be a molecular or genetic epidemiologic study in which biological specimens were obtained for all cases and controls but only a subsample were genotyped (see <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> for another example). This may, however, be considered wasteful since only a fraction of the collected data is used.</p>
         <p>As an alternative, we propose a new variant of the two-phase design called the flexible two-phase design, for which there is a single time of contact. Phase One data are collected for all subjects and Phase Two subjects are selected for immediate complete data collection based on their basic Phase One information. The key principle of this new variant is to fix a priori stratum-wise numbers of cases and controls for full data collection and recruit Phase One subjects until the required numbers of subjects in each stratum are reached.</p>
         <p>We describe the proposed study design and its implementation in terms of power, cost/efficiency considerations and statistical analysis. We illustrate its applicability using two examples from occupational and molecular/genetic epidemiology.</p>
      </sec>
      <sec>
         <st>
            <p>Steps in the planning and the analysis of flexible two-phase studies</p>
         </st>
         <p>We start by defining several key variables and then describe the proposed set-up for the study design. First, define Z, a discrete proxy variable for the exposure(s) of interest (X). Z needs to be collected and available at Phase One. Then, compute the power for several design options within the flexible two-phase design (see below). Based on these computations, select the design option which produces the best compromise between power and feasibility in terms of subject availability, cost and other study-specific criteria that will permit achievement of the study aims.</p>
         <p>The four major steps for the set-up of a study with the proposed design are as follows:</p>
         <sec>
            <st>
               <p>Design set-up</p>
            </st>
            <p>1. Identify a stratification variable Z which is an easily available proxy of the exposure(s) of interest X. The number of strata (J) will equal the number of response choices for Z.</p>
            <p>2. For each stratum, fix the number of cases and controls (n<sub>ij</sub>), based on study power and cost considerations, for whom the exposure of interest X and covariates will be assessed. From n<sub>ij</sub>, compute their expected distributions according to X and the numbers of cases and controls who will need to be screened at Phase One.</p>
         </sec>
         <sec>
            <st>
               <p>Data collection</p>
            </st>
            <p>3. Screen subjects for Z and keep cases and controls for full data collection (i.e. the variable(s) of interest X and potential confounders) until the numbers of cases and controls fixed in step 2 are reached.</p>
            <p>4. Within each stratum j, count the number of cases and controls that were screened in Phase One at Step 3.</p>
         </sec>
         <sec>
            <st>
               <p>Computation of expected numbers and power</p>
            </st>
            <p>As mentioned above, the expected Phase One numbers depend on the fixed stratum-specific Phase Two numbers. They also depend on the study hypotheses including exposure prevalences and odds ratios. Other assumptions, common to all types of two-phase studies, quantify how well the Phase One strata predict the exposure of interest (sensitivity and specificity of proxy Z). The formulas for expected Phase One and Phase Two numbers are given in Appendix 1. From these numbers, one can compute, using specific variance computations given in Schill and Drescher <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, the expected asymptotic variance and the statistical power. A corresponding STATA (StataCorp, College Station Texas) program for data analysis and power computations is included as an online add-on to this paper.</p>
         </sec>
         <sec>
            <st>
               <p>Planning options</p>
            </st>
            <p>A critical issue is how to optimize, in terms of cost and power, the fixed stratum-wise numbers of cases and controls with full data collection. This complex problem has been addressed in different contexts <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. However, one can formulate a general heuristic rule, which has worked well in our applications using Maximum Likelihood as the analysis method. Specifically, choose the numbers of cases and controls for full data collection so that, within both controls and cases, the overall expected Phase Two exposure proportions are as equally distributed as possible. For rare exposures, this means choosing cases and controls to oversample the rarest exposure categories among both groups.</p>
         </sec>
         <sec>
            <st>
               <p>Statistical analysis</p>
            </st>
            <p>The collected data can be analyzed using any two-phase analysis software. As the second phase sample is a biased sample of the original population, a combined analysis of the Phase One and the Phase Two data relies on weighting of the Phase Two data by the inverse sampling fractions. The two main methods for analysis are maximum likelihood (ML) and weighted likelihood (WL) which differ in the weights used; the more efficient ML estimate iteratively adjusts these weights using the estimated disease model. As such software is not readily available, we included our STATA-based two-phase analysis program "blogit_2P.ado" [see additional file <supplr sid="S1">1</supplr>]. The software takes as input the disease indicator, the stratum indicator, the Phase One frequencies, the Phase Two frequencies and the independent variables. A help file accessible from within STATA "blogit_2P.hlp" [see additional file <supplr sid="S2">2</supplr>] is also included as well as an illustrative example [see additional files <supplr sid="S3">3</supplr> and <supplr sid="S4">4</supplr>]. In this paper, we use the ML approach.</p>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p>This is a text file containing the code of the Stata statistical software (StataCorp. 2007; Stata Statistical Software: Release 9 and onwards. College Station, TX: StataCorp LP.) for fitting two-phase data. It can be accessed using any text processor but can only be executed within Stata. It should be saved under the name blogit_2P.ado.</p>
               </text>
               <file name="1742-5573-5-4-S1.txt">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S2">
               <title>
                  <p>Additional file 2</p>
               </title>
               <text>
                  <p>This is a help file describing the preceding program and its options. In can only be displayed as a help file from within Stata. It should be saved under the name blogit_2P.hlp.</p>
               </text>
               <file name="1742-5573-5-4-S2.txt">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S3">
               <title>
                  <p>Additional file 3</p>
               </title>
               <text>
                  <p>This is a text file containing the code of the Stata statistical software for performing the power computation for Example 1 using the above program and performing the computations shown in figure <figr fid="F1">1</figr>. In reads in the data in the data file MWF.raw included as Additional file 4. It can be accessed using any text processor but can only be executed within Stata. It should be saved under the name figure1.do.</p>
               </text>
               <file name="1742-5573-5-4-S3.txt">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S4">
               <title>
                  <p>Additional file 4</p>
               </title>
               <text>
                  <p>This is text file containing the data obtained by performing the computations shown in Appendix 2. It is used by the Stata program figure1.do included as Additional file <supplr sid="S3">3</supplr>. It should be saved under the name mwf.raw.</p>
               </text>
               <file name="1742-5573-5-4-S4.txt">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Examples</p>
         </st>
         <p>To demonstrate the potential efficiency of the flexible two-phase approach, we present two examples from occupational and molecular/genetic epidemiology. In the first example, we detail the computations for a given design; in the second, we perform a full search for optimal designs for given scenarios.</p>
         <sec>
            <st>
               <p>Example 1: Metalworking fluids and bladder cancer</p>
            </st>
            <p>A number of population-based case-control studies have found an association between bladder cancer and metalworking fluids (MWF) exposure (see Calvert <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> for a review). However, because of the low prevalence of the exposure, the numbers of exposed cases and controls in each study were too small to produce a stable estimate of the association. We use a flexible two-phase study to illustrate the efficiency gain over a standard case-control study, considering as a proxy of MWF exposure "having worked in the metal industry". In practice, when contacting cases and controls, for instance in a telephone interview, one of the first questions to the volunteers would be: "Have you ever worked in the metal industry?". Based on the answer to this question the subject would then be included (or not) in Phase Two; that is, the interview would be continued to assess a detailed work history and confounder information.</p>
            <p>Table <tblr tid="T1">1</tblr> details the assumptions. The study proceeds along the four steps as follows:</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Scenario for Example 1</p>
               </caption>
               <tblbdy cols="2">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Variables and parameters characterizing the set-up</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Values of parameters and variables</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Stratification/Proxy Z (with J strata)</p>
                     </c>
                     <c ca="center">
                        <p>Past work in metal industry</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>No: Z = 1</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Yes: Z = 2</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Phase One prevalence among controls (&#964;<sup>0</sup><sub>j</sub>)</p>
                     </c>
                     <c ca="center">
                        <p>Z = 1: &#964;<sup>0</sup><sub>1 </sub>= 80%</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Z = 2: &#964;<sup>0</sup><sub>2 </sub>= 20%*</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Risk factor X (with K outcomes)</p>
                     </c>
                     <c ca="center">
                        <p>Exposure to MWF</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>No: X = 1</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Yes: X = 2</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Disease Model (Odds Ratios) (&#968;<sub>k</sub>)</p>
                     </c>
                     <c ca="center">
                        <p>&#968;<sub>1 </sub>= 1: baseline risk</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>&#968;<sub>2 </sub>= 2<sup>#</sup></p>
                     </c>
                  </r>
                  <r>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Phase Two prevalence of X among controls by stratum (&#960;<sup>0</sup><sub>jk</sub>)</p>
                     </c>
                     <c ca="center">
                        <p>Z = 1: &#960;<sup>0</sup><sub>11 </sub>= 97.5%, &#960;<sup>0</sup><sub>12 </sub>= 2.5%<sup>&amp;</sup></p>
                        <p>Z = 2: &#960;<sup>0</sup><sub>21 </sub>= 75%, &#960;<sup>0</sup><sub>22 </sub>= 25%<sup>@</sup></p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>*20% prevalence of having worked in the metal industry</p>
                  <p>#Exposure to MWF doubles the risk of bladder cancer</p>
                  <p>&amp;Among non-metal-industry workers, 2.5% exposed to MWF</p>
                  <p>@Among metal-industry workers, 25% exposed to MWF</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Study design</p>
            </st>
            <p>1. Stratify subjects by Z (Table <tblr tid="T1">1</tblr>, Line 1).</p>
            <p>2. Per stratum, fix the numbers of cases and controls (160 metal-working and 40 non-metal-working controls, 85 metal-working and 20 non-metal-working cases &#8211; Table <tblr tid="T2">2</tblr> Column 3) to be included and for whom MWF exposure will be assessed at Phase Two. These numbers were chosen using our heuristic rule to reach 80% power to detect the effect of MWF.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Design of the flexible two-phase study for Example 1</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Disease Status</b>
                        </p>
                        <p>(D)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Metal-workers</b>
                        </p>
                        <p>Z (<it>&#964;</it><sup><it>i</it></sup><sub><it>j</it></sub>)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Fixed number of subjects to be included in Phase Two</b>
                        </p>
                        <p>(<it>n</it><sub><it>ij</it></sub>)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Expected Phase One numbers of subjects to be screened</b>
                        </p>
                        <p>(<it>N</it><sub><it>ij</it></sub>)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Expected Proportion of MWF exposure within strata &#167;</b>
                        </p>
                        <p>X(<it>&#960;</it><sup><it>i</it></sup><sub><it>jk</it></sub>)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Expected distribution of subjects by MWF in Phase Two</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>N<sub>0 </sub>= Max(160/20%, 40/80%) = 800</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Control</p>
                     </c>
                     <c ca="center">
                        <p>No (80%*)</p>
                     </c>
                     <c ca="center">
                        <p>40</p>
                     </c>
                     <c ca="center">
                        <p>800*80% = 640</p>
                     </c>
                     <c ca="center">
                        <p>No (97.5%*)</p>
                     </c>
                     <c ca="center">
                        <p>40*97.5% = 39</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Yes (2.5%*)</p>
                     </c>
                     <c ca="center">
                        <p>40*2.5% = 1</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Yes (20%*)</p>
                     </c>
                     <c ca="center">
                        <p>160</p>
                     </c>
                     <c ca="center">
                        <p>800*20% = 160</p>
                     </c>
                     <c ca="center">
                        <p>No (75%*)</p>
                     </c>
                     <c ca="center">
                        <p>160*75% = 120</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Yes (25%*)</p>
                     </c>
                     <c ca="center">
                        <p>160*25% = 40</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>N<sub>1 </sub>= Max(85/23.4%, 20/76.6%) = 364</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Case</p>
                     </c>
                     <c ca="center">
                        <p>No (76.6%#)</p>
                     </c>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>364*76.6% = 278.8</p>
                     </c>
                     <c ca="center">
                        <p>No (95.1%#)</p>
                     </c>
                     <c ca="center">
                        <p>20*95.1% = 19.02</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Yes (4.9%#)</p>
                     </c>
                     <c ca="center">
                        <p>20*4.9% = 0.98</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Yes (23.4%#)</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                     <c ca="center">
                        <p>364*23.4% = 85</p>
                     </c>
                     <c ca="center">
                        <p>No (60%#)</p>
                     </c>
                     <c ca="center">
                        <p>85*60% = 51</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Yes (40%#)</p>
                     </c>
                     <c ca="center">
                        <p>85*40% = 34</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>* Values of parameters fixed in Table 1</p>
                  <p># Values of parameters computed from parameter values fixed in Table 1 (see Appendix 2)</p>
                  <p>&#167; In Phase One controls, the overall expected percentage of MWF exposure is equal to 7%, that is, 2% = 2.5% of 80% non-metal-workers plus 5% = 25% of 20% metal-workers. Similar computations lead to 13% MWF exposure in cases.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Planned data collection</p>
            </st>
            <p>3. Screen cases and controls until the required numbers in each stratum are reached and assess the detailed exposure to MWF and potential confounders in this sample of 305 subjects.</p>
            <p>4. Record the number of subjects screened in order to reach the required sample size. At the planning stage, these numbers are not yet available, but expected numbers can be computed. Assuming 20% metal-workers in the general population, we would expect to screen 800 controls (N<sub>0</sub>) to obtain 160 metal-workers (20% &#215; 800 = 160). Therefore, the number of non-metal worker controls that would have been screened (N<sub>00</sub>) is expected to be 640 (800&#8211;160) of which 40 are included in Phase Two for detailed exposure assessment. For the corresponding computations for cases, see Table <tblr tid="T2">2</tblr> and Appendix 2.</p>
            <p>We note that oversampling the metal-workers has achieved our aim of increased numbers of MWF exposed cases and controls. Among the 200 controls, 41 are exposed (20.5% versus 7% in Phase One) and among the 105 cases, 35 are exposed (33.3% versus 13% in Phase One) (Table <tblr tid="T2">2</tblr>, Column 6 and Footnote &#167;.</p>
            <p>Figure <figr fid="F1">1</figr> shows the STATA output of the analysis of the expected frequencies. The STATA program for this analysis is included as an additional file (figure 1.do [see Additional file <supplr sid="S3">3</supplr>] using the STATA data file MWF.dta [see Additional file <supplr sid="S4">4</supplr>] obtained by applying the computations shown in Appendix 2). In this example d, z, X, N<sub>ij</sub>, n<sub>ijk</sub>, respectively denote, the case status (1 = case, 0 = control), the stratum indicator, the metal fluid indicator (X = 1 exposed, X = 0 unexposed), the stratum-wise numbers in Phase One, and the Phase Two numbers by stratum and exposure to metal fluids. The power is computed using a bilateral Wald test at a 5% level using the following formula: Power = &#934;(&#946;<sub>x</sub>/se(&#946;<sub>x</sub>)-1.96) = 80.2% where &#934; denotes the cumulative standard normal distribution, &#946;<sub>x </sub>the log-odds ratio and se(&#946;<sub>x</sub>) its standard error. The asymptotic standard error se(&#946;<sub>x</sub>) is 0.247 for the log-odds ratio and &#946;<sub>x </sub>= ln(2) = 0.693, as the assumed OR is equal to 2. In contrast, a standard case-control study, in which 200 controls and 105 cases were randomly selected, would yield a se(&#946;<sub>x</sub>) = 0.400, corresponding to 40.9% power using the same formula.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>STATA output for Example 1</p>
               </caption>
               <text>
                  <p>
                     <b>STATA output for Example 1.</b>
                  </p>
               </text>
               <graphic file="1742-5573-5-4-1"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Example 2: Detection of gene-environment interaction</p>
            </st>
            <p>Molecular/genetic epidemiology studies identify genes involved in disease risk, estimate the strength of the disease-gene association and investigate modifier factors that may interact with the susceptibility genes. The study of interactions between genes and "environmental" factors is often challenging because of the rarity of having both factors, i.e., being exposed to the environmental factor of interest and carrying a deleterious allele.</p>
            <p>We present a search for an optimized flexible Two-Phase design, in this setting, assuming that an inexpensive proxy of the deleterious allele (e.g., family history of disease) is available.</p>
            <sec>
               <st>
                  <p>The scenarios</p>
               </st>
               <p>We consider a rare deleterious allele G with 1% prevalence (PG), interacting with an environmental exposure E with 20% prevalence (PE). The odds ratios for E, G and their interaction (I) are respectively 2, 3 and 5 (Table <tblr tid="T3">3</tblr>).</p>
               <tbl id="T3">
                  <title>
                     <p>Table 3</p>
                  </title>
                  <caption>
                     <p>Scenarios for Example 2</p>
                  </caption>
                  <tblbdy cols="2">
                     <r>
                        <c ca="left">
                           <p>
                              <b>Variables and parameters required for set-up</b>
                           </p>
                        </c>
                        <c ca="left">
                           <p>
                              <b>Formulas and values of parameters</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="2">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Stratification/Proxy Z (with J strata)</p>
                        </c>
                        <c ca="left">
                           <p>Environmental exposure E and Gene proxy S<sub>G</sub></p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="left">
                           <p>J = 4</p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="left">
                           <p>Z = 1: E<sup>- </sup>S<sub>G</sub><sup>-</sup>, Z = 2: E<sup>- </sup>S<sub>G</sub><sup>+</sup>, Z = 3: E<sup>+ </sup>S<sub>G</sub><sup>-</sup>, Z = 4: E<sup>+ </sup>S<sub>G</sub><sup>+</sup></p>
                        </c>
                     </r>
                     <r>
                        <c cspan="2">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Phase One prevalence among controls (&#964;<sup>0</sup><sub>j</sub>):</p>
                        </c>
                        <c ca="left">
                           <p>&#964;<sup>0</sup><sub>1 </sub>= Pr(E<sup>-</sup>)Pr(S<sub>G</sub><sup>-</sup>) = (1 - P<sub>E</sub>)[(1-Se)P<sub>G</sub>+Sp(1-P<sub>G</sub>)]</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>P<sub>E </sub>= 20%</p>
                        </c>
                        <c ca="left">
                           <p>&#964;<sup>0</sup><sub>2 </sub>= Pr(E<sup>-</sup>)Pr(S<sub>G</sub><sup>+</sup>) = (1 - P<sub>E</sub>)[SeP<sub>G</sub>+(1-Sp).(1-P<sub>G</sub>)]</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>P<sub>G </sub>= 1%</p>
                        </c>
                        <c ca="left">
                           <p>&#964;<sup>0</sup><sub>3 </sub>= Pr(E<sup>+</sup>)Pr(S<sub>G</sub><sup>-</sup>) = P<sub>E</sub>[(1-Se)P<sub>G</sub>+Sp(1-P<sub>G</sub>)]</p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="left">
                           <p>&#964;<sup>0</sup><sub>4 </sub>= Pr(E<sup>+</sup>)Pr(S<sub>G</sub><sup>+</sup>) = P<sub>E</sub>[SeP<sub>G</sub>+(1-Sp).(1-P<sub>G</sub>)]</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="2">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Risk factor X (with K outcomes)</p>
                        </c>
                        <c ca="left">
                           <p>Exposure to E and exposure to G: K = 4</p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="left">
                           <p>X = 1: E<sup>- </sup>G<sup>-</sup>, X = 2: E<sup>- </sup>G<sup>+</sup>, X = 3: E<sup>+ </sup>G<sup>-</sup>, X = 4: E<sup>+ </sup>G<sup>+</sup></p>
                        </c>
                     </r>
                     <r>
                        <c cspan="2">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Disease Model (Odds Ratios &#968;<sub>k</sub>)</p>
                        </c>
                        <c ca="left">
                           <p>&#968;<sub>1 </sub>= 1, &#968;<sub>2 </sub>= 3, &#968;<sub>3 </sub>= 2, &#968;<sub>4 </sub>= &#968;<sub>2 </sub>&#215; &#968;<sub>3 </sub>&#215; OR<sub>I </sub>= 30</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="2">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Phase Two prevalence of X among controls by stratum (&#960;<sup>0</sup><sub>jk</sub>)</p>
                        </c>
                        <c ca="left">
                           <p>Z = 1: &#960;<sup>0</sup><sub>11 </sub>= (1 -P<sub>E</sub>)Sp(1-P<sub>G</sub>)/Pr(S<sub>G</sub><sup>-</sup>),</p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="left">
                           <p>&#960;<sup>0</sup><sub>12 </sub>= 1 - &#960;<sup>0</sup><sub>11</sub>, &#960;<sup>0</sup><sub>13 </sub>= &#960;<sup>0</sup><sub>14 </sub>= 0</p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="left">
                           <p>Z = 2: &#960;<sup>0</sup><sub>21 </sub>= (1 - P<sub>E</sub>)(1 - Sp)(1-P<sub>G</sub>)/Pr(S<sub>G</sub><sup>+</sup>),</p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="left">
                           <p>&#960;<sup>0</sup><sub>22 </sub>= 1 - &#960;<sup>0</sup><sub>21</sub>, &#960;<sup>0</sup><sub>23 </sub>= &#960;<sup>0</sup><sub>24 </sub>= 0</p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="left">
                           <p>Z = 3: &#960;<sup>0</sup><sub>31 </sub>= &#960;<sup>0</sup><sub>32 </sub>= 0, &#960;<sup>0</sup><sub>33 </sub>= P<sub>E </sub>Sp(1-P<sub>G</sub>)/Pr(S<sub>G</sub><sup>-</sup>),</p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="left">
                           <p>&#960;<sup>0</sup><sub>34 </sub>= 1 - &#960;<sup>0</sup><sub>33</sub></p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="left">
                           <p>Z = 4: &#960;<sup>0</sup><sub>41 </sub>= = &#960;<sup>0</sup><sub>42 </sub>= 0, &#960;<sup>0</sup><sub>43 </sub>= P<sub>E </sub>(1 - Sp)(1-P<sub>G</sub>)/Pr(S<sub>G</sub><sup>+</sup>),</p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="left">
                           <p>&#960;<sup>0</sup><sub>44 </sub>= 1 - &#960;<sup>0</sup><sub>43</sub></p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p>*Se = sensitivity; Sp = specificity</p>
                  </tblfn>
               </tbl>
               <p>We further assume that the proxy of the susceptibility gene (SG) and the environmental exposure (E) are available at Phase One for an unlimited number of controls. However, we restrict the number of cases available in Phase One to a maximum of 2000 cases. We further assume that capacities for genotyping restrict the total number of subjects (cases + controls) that can be included in Phase Two to a maximum of 1200 subjects. We assume that the cost of genotyping is 20 times the cost of screening. Such a cost ratio would arise if, for example, a SNP array costs $100 and 15 minutes for a trained interviewer screening a subject for E and SG costs $5. We repeat the design search for each combination of sensitivity (Se) and specificity (Sp) of 0.6, 0.7, 0.8, and 0.9.</p>
            </sec>
            <sec>
               <st>
                  <p>Planning the design</p>
               </st>
               <p>The aim of the flexible two-phase approach is to choose subjects for genotyping to optimize the study power for given costs. This is achieved by oversampling subjects with positive gene proxy and environmental exposure.</p>
               <p>In practice, such oversampling could be done during case/control recruitment using a short interview that allows assessment of the environmental exposure and the gene proxy (e.g., a family history of disease) and getting a blood/buccal sample (for genotyping) only for the subjects sampled for Phase Two based on the results of this first interview.</p>
               <p>Step 1: The stratification is by gene surrogate and environmental exposure (Table <tblr tid="T3">3</tblr>, line 1).</p>
               <p>Step 2 entails choosing the stratum-wise numbers of cases and controls to be included in Phase Two. We use our general heuristic rule with respect to E and fix at 50% the target numbers of E+ and E- to be included in Phase Two among cases and controls. The amount by which we oversample SG+ will be considered through use of two additional parameters, the proportion of controls &#961;0 with SG+ and the proportion &#961;1 of cases with SG+. For example, if we selected 800 controls and 400 cases with proportions &#961;0 = 80% and &#961;1 = 60%, this would correspond to 800*50%*80% = 320 E+ SG+ controls, 400*50%*60% = 120 E+ SG+ cases, 800*50%*20% = 80 E+ SG- controls and so on.</p>
            </sec>
            <sec>
               <st>
                  <p>Comparing designs</p>
               </st>
               <p>We now consider a series of design options for this example for which we compare power and cost. To meet the constraints on availability and capacity fixed above, the designs considered have numbers of cases ranging from 100 to 600 and numbers of controls from 400 to 1100 in steps of 100, with a maximum of 1200 subjects to be included in Phase Two. For each of these combinations, &#961;0 and &#961;1 are varied from 40% to 90%. This corresponds to several hundred possible designs for each combination of sensitivity and specificity of SG.</p>
               <p>Table <tblr tid="T4">4</tblr> shows, for each combination of sensitivity and specificity, the design which achieves the maximal power to detect OR<sub>I </sub>= 5. Only designs achieving 80% power are shown. For example, if SG has 80% specificity and 70% sensitivity, the design with the highest power would include 400 cases and 800 controls with &#961;<sub>1 </sub>= &#961;<sub>0 </sub>= 90% SG+ (Table <tblr tid="T4">4</tblr>, line 4). We would, thus, include 90%*400 = 360 SG+ cases and 720 SG+ controls for genotyping. The expected numbers of cases to be screened would be 1889 and the expected number of controls would be 8780.</p>
               <tbl id="T4">
                  <title>
                     <p>Table 4</p>
                  </title>
                  <caption>
                     <p>Designs with maximal power of detecting the interaction, according to sensitivity and specificity</p>
                  </caption>
                  <tblbdy cols="10">
                     <r>
                        <c ca="left" cspan="2">
                           <p>Gene-surrogate</p>
                        </c>
                        <c ca="center" cspan="4">
                           <p>Flexible two-phase design options</p>
                        </c>
                        <c ca="center" cspan="2">
                           <p>Expected Phase One counts</p>
                        </c>
                        <c ca="center">
                           <p>Power#</p>
                        </c>
                        <c ca="center">
                           <p>Cost*</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="10">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Spec</p>
                        </c>
                        <c ca="left">
                           <p>Sens</p>
                        </c>
                        <c ca="left">
                           <p>n<sub>0</sub></p>
                        </c>
                        <c ca="left">
                           <p>n<sub>1</sub></p>
                        </c>
                        <c ca="left">
                           <p>&#961;<sub>0</sub>&#8224;</p>
                        </c>
                        <c ca="center">
                           <p>&#961;<sub>1</sub>&#8225;</p>
                        </c>
                        <c ca="center">
                           <p>N<sub>0</sub></p>
                        </c>
                        <c ca="center">
                           <p>N<sub>1</sub></p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c cspan="10">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>70%</p>
                        </c>
                        <c ca="left">
                           <p>80%</p>
                        </c>
                        <c ca="left">
                           <p>800</p>
                        </c>
                        <c ca="left">
                           <p>400</p>
                        </c>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="center">
                           <p>90%</p>
                        </c>
                        <c ca="center">
                           <p>5902</p>
                        </c>
                        <c ca="center">
                           <p>1373</p>
                        </c>
                        <c ca="center">
                           <p>83%</p>
                        </c>
                        <c ca="center">
                           <p>1564</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>70%</p>
                        </c>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="left">
                           <p>800</p>
                        </c>
                        <c ca="left">
                           <p>400</p>
                        </c>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="center">
                           <p>90%</p>
                        </c>
                        <c ca="center">
                           <p>5882</p>
                        </c>
                        <c ca="center">
                           <p>1325</p>
                        </c>
                        <c ca="center">
                           <p>87%</p>
                        </c>
                        <c ca="center">
                           <p>1560</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>80%</p>
                        </c>
                        <c ca="left">
                           <p>60%</p>
                        </c>
                        <c ca="left">
                           <p>800</p>
                        </c>
                        <c ca="left">
                           <p>400</p>
                        </c>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="center">
                           <p>90%</p>
                        </c>
                        <c ca="center">
                           <p>8824</p>
                        </c>
                        <c ca="center">
                           <p>1988</p>
                        </c>
                        <c ca="center">
                           <p>87%</p>
                        </c>
                        <c ca="center">
                           <p>1741</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>80%</p>
                        </c>
                        <c ca="left">
                           <p>70%</p>
                        </c>
                        <c ca="left">
                           <p>800</p>
                        </c>
                        <c ca="left">
                           <p>400</p>
                        </c>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="center">
                           <p>90%</p>
                        </c>
                        <c ca="center">
                           <p>8780</p>
                        </c>
                        <c ca="center">
                           <p>1889</p>
                        </c>
                        <c ca="center">
                           <p>91%</p>
                        </c>
                        <c ca="center">
                           <p>1733</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>80%</p>
                        </c>
                        <c ca="left">
                           <p>80%</p>
                        </c>
                        <c ca="left">
                           <p>800</p>
                        </c>
                        <c ca="left">
                           <p>400</p>
                        </c>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="center">
                           <p>90%</p>
                        </c>
                        <c ca="center">
                           <p>8738</p>
                        </c>
                        <c ca="center">
                           <p>1800</p>
                        </c>
                        <c ca="center">
                           <p>94%</p>
                        </c>
                        <c ca="center">
                           <p>1727</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>80%</p>
                        </c>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="left">
                           <p>800</p>
                        </c>
                        <c ca="left">
                           <p>400</p>
                        </c>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="center">
                           <p>90%</p>
                        </c>
                        <c ca="center">
                           <p>8696</p>
                        </c>
                        <c ca="center">
                           <p>1718</p>
                        </c>
                        <c ca="center">
                           <p>96%</p>
                        </c>
                        <c ca="center">
                           <p>1720</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="left">
                           <p>60%</p>
                        </c>
                        <c ca="left">
                           <p>900</p>
                        </c>
                        <c ca="left">
                           <p>300</p>
                        </c>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="center">
                           <p>90%</p>
                        </c>
                        <c ca="center">
                           <p>19286</p>
                        </c>
                        <c ca="center">
                           <p>2000</p>
                        </c>
                        <c ca="center">
                           <p>98%</p>
                        </c>
                        <c ca="center">
                           <p>2264</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="left">
                           <p>70%</p>
                        </c>
                        <c ca="left">
                           <p>900</p>
                        </c>
                        <c ca="left">
                           <p>300</p>
                        </c>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="center">
                           <p>90%</p>
                        </c>
                        <c ca="center">
                           <p>19104</p>
                        </c>
                        <c ca="center">
                           <p>2000</p>
                        </c>
                        <c ca="center">
                           <p>99%</p>
                        </c>
                        <c ca="center">
                           <p>2255</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="left">
                           <p>80%</p>
                        </c>
                        <c ca="left">
                           <p>900</p>
                        </c>
                        <c ca="left">
                           <p>300</p>
                        </c>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="center">
                           <p>90%</p>
                        </c>
                        <c ca="center">
                           <p>18925</p>
                        </c>
                        <c ca="center">
                           <p>1960</p>
                        </c>
                        <c ca="center">
                           <p>99.6%</p>
                        </c>
                        <c ca="center">
                           <p>2244</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="left">
                           <p>900</p>
                        </c>
                        <c ca="left">
                           <p>300</p>
                        </c>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="center">
                           <p>90%</p>
                        </c>
                        <c ca="center">
                           <p>18750</p>
                        </c>
                        <c ca="center">
                           <p>1835</p>
                        </c>
                        <c ca="center">
                           <p>99.8%</p>
                        </c>
                        <c ca="center">
                           <p>2229</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
               <p>Table <tblr tid="T5">5</tblr> shows, for each combination of sensitivity and specificity, the design which achieves the minimal cost with 80% power to detect OR<sub>I </sub>= 5. Using the same example as above, this design would include 300 cases with 80% SG+ and 600 controls with 90% SG+. This would imply screening 1259 cases and 6585 controls and would correspond to a 25% cost decrease compared to the most powerful design (1292 vs. 1733) (Table <tblr tid="T5">5</tblr>, line 4).</p>
               <tbl id="T5">
                  <title>
                     <p>Table 5</p>
                  </title>
                  <caption>
                     <p>Designs with minimum cost among designs with 80% power of detecting the interaction</p>
                  </caption>
                  <tblbdy cols="10">
                     <r>
                        <c ca="left" cspan="2">
                           <p>Gene-surrogate</p>
                        </c>
                        <c ca="center" cspan="4">
                           <p>Flexible two-phase design options</p>
                        </c>
                        <c ca="center" cspan="2">
                           <p>Expected Phase One counts</p>
                        </c>
                        <c ca="center">
                           <p>Power#</p>
                        </c>
                        <c ca="center">
                           <p>Cost*</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="10">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Spec</p>
                        </c>
                        <c ca="left">
                           <p>Sens</p>
                        </c>
                        <c ca="left">
                           <p>n<sub>0</sub></p>
                        </c>
                        <c ca="left">
                           <p>n<sub>1</sub></p>
                        </c>
                        <c ca="left">
                           <p>&#961;<sub>0</sub>&#8224;</p>
                        </c>
                        <c ca="left">
                           <p>&#961;<sub>1</sub>&#8225;</p>
                        </c>
                        <c ca="right">
                           <p>N<sub>0</sub></p>
                        </c>
                        <c ca="right">
                           <p>N<sub>1</sub></p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c cspan="10">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>70%</p>
                        </c>
                        <c ca="left">
                           <p>80%</p>
                        </c>
                        <c ca="left">
                           <p>700</p>
                        </c>
                        <c ca="left">
                           <p>500</p>
                        </c>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="left">
                           <p>80%</p>
                        </c>
                        <c ca="right">
                           <p>5163</p>
                        </c>
                        <c ca="right">
                           <p>1525</p>
                        </c>
                        <c ca="center">
                           <p>81%</p>
                        </c>
                        <c ca="center">
                           <p>1534</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>70%</p>
                        </c>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="left">
                           <p>600</p>
                        </c>
                        <c ca="left">
                           <p>500</p>
                        </c>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="left">
                           <p>80%</p>
                        </c>
                        <c ca="right">
                           <p>4412</p>
                        </c>
                        <c ca="right">
                           <p>1472</p>
                        </c>
                        <c ca="center">
                           <p>80%</p>
                        </c>
                        <c ca="center">
                           <p>1394</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>80%</p>
                        </c>
                        <c ca="left">
                           <p>60%</p>
                        </c>
                        <c ca="left">
                           <p>700</p>
                        </c>
                        <c ca="left">
                           <p>300</p>
                        </c>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="right">
                           <p>7721</p>
                        </c>
                        <c ca="right">
                           <p>1491</p>
                        </c>
                        <c ca="center">
                           <p>80%</p>
                        </c>
                        <c ca="center">
                           <p>1461</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>80%</p>
                        </c>
                        <c ca="left">
                           <p>70%</p>
                        </c>
                        <c ca="left">
                           <p>600</p>
                        </c>
                        <c ca="left">
                           <p>300</p>
                        </c>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="left">
                           <p>80%</p>
                        </c>
                        <c ca="right">
                           <p>6585</p>
                        </c>
                        <c ca="right">
                           <p>1259</p>
                        </c>
                        <c ca="center">
                           <p>80%</p>
                        </c>
                        <c ca="center">
                           <p>1292</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>80%</p>
                        </c>
                        <c ca="left">
                           <p>80%</p>
                        </c>
                        <c ca="left">
                           <p>500</p>
                        </c>
                        <c ca="left">
                           <p>300</p>
                        </c>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="left">
                           <p>80%</p>
                        </c>
                        <c ca="right">
                           <p>5461</p>
                        </c>
                        <c ca="right">
                           <p>1200</p>
                        </c>
                        <c ca="center">
                           <p>80%</p>
                        </c>
                        <c ca="center">
                           <p>1133</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>80%</p>
                        </c>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="left">
                           <p>400</p>
                        </c>
                        <c ca="left">
                           <p>400</p>
                        </c>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="left">
                           <p>80%</p>
                        </c>
                        <c ca="right">
                           <p>4348</p>
                        </c>
                        <c ca="right">
                           <p>1528</p>
                        </c>
                        <c ca="center">
                           <p>81%</p>
                        </c>
                        <c ca="center">
                           <p>1094</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="left">
                           <p>60%</p>
                        </c>
                        <c ca="left">
                           <p>400</p>
                        </c>
                        <c ca="left">
                           <p>400</p>
                        </c>
                        <c ca="left">
                           <p>70%</p>
                        </c>
                        <c ca="left">
                           <p>50%</p>
                        </c>
                        <c ca="right">
                           <p>6667</p>
                        </c>
                        <c ca="right">
                           <p>1683</p>
                        </c>
                        <c ca="center">
                           <p>81%</p>
                        </c>
                        <c ca="center">
                           <p>1217</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="left">
                           <p>70%</p>
                        </c>
                        <c ca="left">
                           <p>500</p>
                        </c>
                        <c ca="left">
                           <p>300</p>
                        </c>
                        <c ca="left">
                           <p>50%</p>
                        </c>
                        <c ca="left">
                           <p>50%</p>
                        </c>
                        <c ca="right">
                           <p>5896</p>
                        </c>
                        <c ca="right">
                           <p>1169</p>
                        </c>
                        <c ca="center">
                           <p>80%</p>
                        </c>
                        <c ca="center">
                           <p>1153</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="left">
                           <p>80%</p>
                        </c>
                        <c ca="left">
                           <p>500</p>
                        </c>
                        <c ca="left">
                           <p>300</p>
                        </c>
                        <c ca="left">
                           <p>40%</p>
                        </c>
                        <c ca="left">
                           <p>60%</p>
                        </c>
                        <c ca="right">
                           <p>4673</p>
                        </c>
                        <c ca="right">
                           <p>1307</p>
                        </c>
                        <c ca="center">
                           <p>80%</p>
                        </c>
                        <c ca="center">
                           <p>1099</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="left">
                           <p>500</p>
                        </c>
                        <c ca="left">
                           <p>300</p>
                        </c>
                        <c ca="left">
                           <p>40%</p>
                        </c>
                        <c ca="left">
                           <p>50%</p>
                        </c>
                        <c ca="right">
                           <p>4630</p>
                        </c>
                        <c ca="right">
                           <p>1019</p>
                        </c>
                        <c ca="center">
                           <p>82%</p>
                        </c>
                        <c ca="center">
                           <p>1082</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p># Analysis approach: Maximum likelihood</p>
                     <p>* the study cost is computed as the sum of the number of screened subjects divided by 20 plus the number of subjects included in Phase Two.</p>
                     <p>&#8224; &#961;<sub>0 </sub>is the proportion of S<sub>G</sub><sup>+ </sup>controls included in Phase Two</p>
                     <p>&#8225; &#961;<sub>1 </sub>is the proportion of S<sub>G</sub><sup>+ </sup>cases included in Phase Two</p>
                  </tblfn>
               </tbl>
               <p>Note that the better the proxy, the more effective the flexible two-phase approach. For example, for a gene proxy with 70% specificity and 80% sensitivity, the most cost effective design costs 1534 units whereas the most cost effective design for a gene proxy with 90% specificity and 90% sensitivity costs 1082 units.</p>
            </sec>
            <sec>
               <st>
                  <p>Comparison with standard case-control studies</p>
               </st>
               <p>For the scenario considered, the most powerful standard case-control study with 1200 genotyped subjects would include 300 cases and 900 controls with an expected var(&#946;<sub>I</sub>) = 0.96, corresponding to a statistical power of 37%. Achieving 80% power would require var(&#946;<sub>I</sub>) = 0.33. Thus, for a standard case-control study to attain 80% power, it would require genotyping of 870 cases (i.e. 300 &#215; 0.96/0.33) and 2610 controls (i.e. 900 &#215; 0.96/0.33), totaling a cost of 3480 units. This compares to 1534 units in the most cost-effective flexible two-phase design assuming 70% specificity and 80% sensitivity.</p>
            </sec>
            <sec>
               <st>
                  <p>Comparison with balanced two-phase studies</p>
               </st>
               <p>A second comparison of interest would be a comparison with balanced two-phase studies, the design that is generally recommended in papers on two-phase studies (see <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B12">12</abbr></abbrgrp>). As mentioned in the introduction, these studies start from a fixed Phase One sample and draw equal numbers in each stratum for Phase Two data collection. In order to be comparable to our flexible design, we considered a design in which 8000 controls and 2000 cases were assessed in Phase One and 800 controls and 400 cases included in Phase Two. As the design is balanced, we selected equal numbers, i.e., 200 controls and 100 cases from each stratum defined by SG &#215; E.</p>
               <p>This balanced Two-Phase design is always less efficient than the Flexible Two-Phase design although more efficient than the standard case-control design. For instance, in the preceding example with 70% specificity and 80% sensitivity, the expected variance is var(&#946;<sub>I</sub>) = 0.47, corresponding to a statistical power of 65%. The corresponding cost is 1200+(10000:20) = 1700 units.</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>Two-phase studies are efficient compared to standard case-control designs. The variant design presented in this paper improves on some aspects of standard two-phase studies. Specifically, with respect to data collection there is only one time of contact. At a time when studies are struggling with decreasing response rates, collection of all necessary data at a single time of contact may result in improved overall participation rates. Moreover, for rare exposures, minimum numbers of exposed subjects can be guaranteed in this design, thus increasing the power, even compared with standard balanced Two-Phase designs. The disadvantage of the flexible two-phase design compared to other designs, including standard two-phase, is the additional complexity in design planning. Another possible disadvantage is that the categories that are relatively easy to fill will be filled quickly during recruitment, while the hard-to-fill categories will take longer to reach their sampling targets. This can produce complex relationships between covariates and recruitment times. This could be alleviated by the randomized recruitment approach proposed by Weinberg and Sandler <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> in which the most common Phase One category would be included in Phase Two with a given probability, chosen so that all categories are filled in at about the same time.</p>
         <p>In the examples presented, we focused on rare exposures for which one could identify inexpensive proxies. Using our proposed heuristic rule, this allows oversampling the rare exposure and thus increasing power. This approach is efficient provided the analysis method used is maximum likelihood, thus, implicitly assuming non-differential misclassification, i.e., that the proxy is not a confounder. In practical terms, this means that the disease risk, given exposure, is the same in all strata. If the disease risk varies across strata, the effect of exposure may have to be assessed separately in each stratum resulting in reduced power to detect the effect of exposure in the underrepresented strata.</p>
         <p>One major consideration for the flexible two-phase design is the availability of an adequate proxy for Phase One screening. The proxy must be easily obtained on all screened subjects but must also have high sensitivity and specificity. For a study focused on occupational exposures, as in example 1, a question about working in the industry of interest is easily collected and should yield a reasonable proxy for exposure. This binary stratification for the proxy may be extended to increase sensitivity and specificity. For example, one could ask about duration of work in a particular industry, thereby obtaining a proxy of the actual cumulative dose. Similarly, a positive family history was previously shown <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> to be a good proxy for a rare gene with a strong effect. However, as the effect of the allele decreases and its frequency increases (as would be the situation for a low-risk gene) the sensitivity and specificity for family history decreases. In such situations, an alternative proxy for G may need to be considered, such as age at diagnosis, or a quick inexpensive physiologic test during the in-person interview at Phase One. Of course, the more information obtained at Phase One, the more expensive Phase One becomes.</p>
         <p>We acknowledge that a gene-environment interaction odds ratio of 5 may be rather extreme for most diseases, particularly given some recent findings, as in <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. We are currently working on a more topic-oriented comparison of different study designs for detecting gene-environment interactions using a wider range of scenarios and including the Flexible Two-Phase design and case-only design (under the assumption of independence of Genetic and Environmental factors in the population).</p>
         <p>In the present paper, we focused on the estimation of a single odds-ratio. However, dose-response estimation is possible, as long as detailed data are available at Phase Two. Similarly, it is possible to adjust for confounders as long as the relevant data are available in Phase Two. However, since the flexible two-phase design is mostly targeted on predefined hypotheses, especially if one oversamples some strata, there may be limited power to test other hypotheses or perform exploratory analyses. For example, exposure to some aromatic amines increases risk for bladder cancer, but this exposure is rare in the metal industry. Thus, the design we considered would have low power for detecting this risk. Many epidemiologic studies are exploratory in that they assess the effects of a large spectrum of factors without focusing on predefined hypotheses. The Flexible Two-Phase design is not adapted to this situation and focuses necessarily on a restricted number of explicitly stated hypotheses. We are, however, convinced that in many circumstances, only studies with predefined hypotheses will allow progress in understanding disease etiology.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>In conclusion, the flexible two-phase design expands the advantages of two-phase designs to substantially increase power for studies of rare disease-exposure combinations. The flexible two-phase design may be the design of choice in well targeted studies of the effect of rare exposures for which inexpensive proxies are available.</p>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>MWF: metal working fluid; SG: the surrogate of the gene G considered as a risk factor.</p>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The authors declare that they have no competing interests.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>The idea of this new method originated from discussions between PW and WS. PW wrote the first draft, carried out the computations and prepared the tables and figure. NA and AMG contributed the gene-environment example and parts of the discussion. All authors participated substantially in the writing of the submitted manuscript and approved the submitted version.</p>
      </sec>
      <sec>
         <st>
            <p>Endnotes</p>
         </st>
         <sec>
            <st>
               <p>Appendix 1: Computation of expected numbers for a given design and scenario</p>
            </st>
            <p>Let Z denote the proxy variable for X, the exposure of interest and define</p>
            <p>- <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1742-5573-5-4-i1"><m:semantics><m:mrow><m:msubsup><m:mi>&#964;</m:mi><m:mi>j</m:mi><m:mn>0</m:mn></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqiXdq3aa0baaSqaaiabdQgaQbqaaiabicdaWaaaaaa@3012@</m:annotation></m:semantics></m:math></inline-formula> the Phase One proportion of the j<sup>th </sup>stratum within controls.</p>
            <p>- <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1742-5573-5-4-i2"><m:semantics><m:mrow><m:msubsup><m:mi>&#960;</m:mi><m:mrow><m:mi>j</m:mi><m:mi>k</m:mi></m:mrow><m:mn>0</m:mn></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqiWda3aa0baaSqaaiabdQgaQjabdUgaRbqaaiabicdaWaaaaaa@3169@</m:annotation></m:semantics></m:math></inline-formula> the Phase Two proportion of the k<sup>th </sup>outcome of X within stratum j of controls.</p>
            <p>The proportion of cases in each stratum depends on the corresponding proportion of controls and the assumed odds ratios (<it>&#968;</it><sub><it>k</it></sub>). Let us denote by</p>
            <p>q<sub>j </sub>the stratum-specific weighted odds ratio, <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1742-5573-5-4-i3"><m:semantics><m:mrow><m:msub><m:mi>q</m:mi><m:mi>j</m:mi></m:msub><m:mo>=</m:mo><m:mstyle displaystyle="true"><m:munder><m:mo>&#8721;</m:mo><m:mi>k</m:mi></m:munder><m:mrow><m:msubsup><m:mi>&#960;</m:mi><m:mrow><m:mi>j</m:mi><m:mi>k</m:mi></m:mrow><m:mn>0</m:mn></m:msubsup><m:mo>&#215;</m:mo><m:msub><m:mi>&#968;</m:mi><m:mi>k</m:mi></m:msub></m:mrow></m:mstyle></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyCae3aaSbaaSqaaiabdQgaQbqabaGccqGH9aqpdaaeqbqaaiabec8aWnaaDaaaleaacqWGQbGAcqWGRbWAaeaacqaIWaamaaGccqGHxdaTcqaHipqEdaWgaaWcbaGaem4AaSgabeaaaeaacqWGRbWAaeqaniabggHiLdaaaa@3E5E@</m:annotation></m:semantics></m:math></inline-formula></p>
            <p><inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1742-5573-5-4-i4"><m:semantics><m:mrow><m:msubsup><m:mi>&#964;</m:mi><m:mi>j</m:mi><m:mn>1</m:mn></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqiXdq3aa0baaSqaaiabdQgaQbqaaiabigdaXaaaaaa@3014@</m:annotation></m:semantics></m:math></inline-formula> the Phase One proportion of the j<sup>th </sup>stratum within cases, <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1742-5573-5-4-i5"><m:semantics><m:mrow><m:msubsup><m:mi>&#964;</m:mi><m:mi>j</m:mi><m:mn>1</m:mn></m:msubsup><m:mo>=</m:mo><m:mfrac><m:mrow><m:msubsup><m:mi>&#964;</m:mi><m:mi>j</m:mi><m:mn>0</m:mn></m:msubsup><m:mo>&#215;</m:mo><m:msub><m:mi>q</m:mi><m:mi>j</m:mi></m:msub></m:mrow><m:mrow><m:mstyle displaystyle="true"><m:munder><m:mo>&#8721;</m:mo><m:mi>j</m:mi></m:munder><m:mrow><m:msubsup><m:mi>&#964;</m:mi><m:mi>j</m:mi><m:mn>0</m:mn></m:msubsup><m:mo>&#215;</m:mo><m:msub><m:mi>q</m:mi><m:mi>j</m:mi></m:msub></m:mrow></m:mstyle></m:mrow></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqiXdq3aa0baaSqaaiabdQgaQbqaaiabigdaXaaakiabg2da9KqbaoaalaaabaGaeqiXdq3aa0baaeaacqWGQbGAaeaacqaIWaamaaGaey41aqRaemyCae3aaSbaaeaacqWGQbGAaeqaaaqaamaaqafabaGaeqiXdq3aa0baaeaacqWGQbGAaeaacqaIWaamaaGaey41aqRaemyCae3aaSbaaeaacqWGQbGAaeqaaaqaaiabdQgaQbqabiabggHiLdaaaaaa@478E@</m:annotation></m:semantics></m:math></inline-formula></p>
            <p><inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1742-5573-5-4-i6"><m:semantics><m:mrow><m:msubsup><m:mi>&#960;</m:mi><m:mrow><m:mi>j</m:mi><m:mi>k</m:mi></m:mrow><m:mn>1</m:mn></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqiWda3aa0baaSqaaiabdQgaQjabdUgaRbqaaiabigdaXaaaaaa@316B@</m:annotation></m:semantics></m:math></inline-formula> the Phase Two proportion of the k<sup>th </sup>outcome of X within stratum j of cases, <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1742-5573-5-4-i7"><m:semantics><m:mrow><m:msubsup><m:mi>&#960;</m:mi><m:mrow><m:mi>j</m:mi><m:mi>k</m:mi></m:mrow><m:mn>1</m:mn></m:msubsup><m:mo>=</m:mo><m:msubsup><m:mi>&#960;</m:mi><m:mrow><m:mi>j</m:mi><m:mi>k</m:mi></m:mrow><m:mn>0</m:mn></m:msubsup><m:mo>&#215;</m:mo><m:mfrac><m:mrow><m:msub><m:mi>&#968;</m:mi><m:mi>k</m:mi></m:msub></m:mrow><m:mrow><m:msub><m:mi>q</m:mi><m:mi>j</m:mi></m:msub></m:mrow></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqiWda3aa0baaSqaaiabdQgaQjabdUgaRbqaaiabigdaXaaakiabg2da9iabec8aWnaaDaaaleaacqWGQbGAcqWGRbWAaeaacqaIWaamaaGccqGHxdaTjuaGdaWcaaqaaiabeI8a5naaBaaabaGaem4AaSgabeaaaeaacqWGXbqCdaWgaaqaaiabdQgaQbqabaaaaaaa@4105@</m:annotation></m:semantics></m:math></inline-formula></p>
            <p>The flexible two-phase approach starts with fixed numbers of controls (n<sub>0<it>j</it></sub>) and cases (n<sub>1<it>j</it></sub>), from which one computes</p>
            <p>- <it>N</it><sub><it>ij</it></sub>, the expected Phase One numbers of cases and controls to be screened in each stratum j,</p>
            <p>- <it>n</it><sup><it>i</it></sup><sub><it>jk</it></sub>, the expected Phase Two stratum-wise numbers in each exposure category k</p>
            <p>Phase One:</p>
            <p>The overall expected number of Phase One controls <it>N</it><sub><it>0 </it></sub>and cases <it>N</it><sub><it>1 </it></sub>to be screened are <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1742-5573-5-4-i8"><m:semantics><m:mrow><m:msub><m:mi>N</m:mi><m:mn>0</m:mn></m:msub><m:mo>=</m:mo><m:mi>max</m:mi><m:mo>&#8289;</m:mo><m:mrow><m:mo>(</m:mo><m:mrow><m:mfrac><m:mrow><m:msub><m:mi>n</m:mi><m:mrow><m:mn>0</m:mn><m:mi>j</m:mi></m:mrow></m:msub></m:mrow><m:mrow><m:msubsup><m:mi>&#964;</m:mi><m:mi>j</m:mi><m:mn>0</m:mn></m:msubsup></m:mrow></m:mfrac></m:mrow><m:mo>)</m:mo></m:mrow></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemOta40aaSbaaSqaaiabicdaWaqabaGccqGH9aqpcyGGTbqBcqGGHbqycqGG4baEdaqadaqcfayaamaalaaabaGaemOBa42aaSbaaeaacqaIWaamcqWGQbGAaeqaaaqaaiabes8a0naaDaaabaGaemOAaOgabaGaeGimaadaaaaaaOGaayjkaiaawMcaaaaa@3D7E@</m:annotation></m:semantics></m:math></inline-formula> and <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1742-5573-5-4-i9"><m:semantics><m:mrow><m:msub><m:mi>N</m:mi><m:mn>1</m:mn></m:msub><m:mo>=</m:mo><m:mi>max</m:mi><m:mo>&#8289;</m:mo><m:mrow><m:mo>(</m:mo><m:mrow><m:mfrac><m:mrow><m:msub><m:mi>n</m:mi><m:mrow><m:mn>1</m:mn><m:mi>j</m:mi></m:mrow></m:msub></m:mrow><m:mrow><m:msubsup><m:mi>&#964;</m:mi><m:mi>j</m:mi><m:mn>1</m:mn></m:msubsup></m:mrow></m:mfrac></m:mrow><m:mo>)</m:mo></m:mrow></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemOta40aaSbaaSqaaiabigdaXaqabaGccqGH9aqpcyGGTbqBcqGGHbqycqGG4baEdaqadaqcfayaamaalaaabaGaemOBa42aaSbaaeaacqaIXaqmcqWGQbGAaeqaaaqaaiabes8a0naaDaaabaGaemOAaOgabaGaeGymaedaaaaaaOGaayjkaiaawMcaaaaa@3D84@</m:annotation></m:semantics></m:math></inline-formula></p>
            <p>From these, one obtains the stratum-specific expected Phase One numbers</p>
            <p><it>N</it><sub>0<it>j </it></sub>= <it>N</it><sub>0 </sub>&#215; <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1742-5573-5-4-i1"><m:semantics><m:mrow><m:msubsup><m:mi>&#964;</m:mi><m:mi>j</m:mi><m:mn>0</m:mn></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqiXdq3aa0baaSqaaiabdQgaQbqaaiabicdaWaaaaaa@3012@</m:annotation></m:semantics></m:math></inline-formula> and <it>N</it><sub>1<it>j </it></sub>= <it>N</it><sub>1 </sub>&#215; <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1742-5573-5-4-i4"><m:semantics><m:mrow><m:msubsup><m:mi>&#964;</m:mi><m:mi>j</m:mi><m:mn>1</m:mn></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqiXdq3aa0baaSqaaiabdQgaQbqaaiabigdaXaaaaaa@3014@</m:annotation></m:semantics></m:math></inline-formula></p>
            <p>Phase Two:</p>
            <p>The expected numbers in each Phase Two exposure category are computed as <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1742-5573-5-4-i10"><m:semantics><m:mrow><m:msubsup><m:mi>n</m:mi><m:mrow><m:mi>j</m:mi><m:mi>k</m:mi></m:mrow><m:mn>0</m:mn></m:msubsup><m:mo>=</m:mo><m:msub><m:mi>n</m:mi><m:mrow><m:mn>0</m:mn><m:mi>j</m:mi></m:mrow></m:msub><m:mo>&#215;</m:mo><m:msubsup><m:mi>&#960;</m:mi><m:mrow><m:mi>j</m:mi><m:mi>k</m:mi></m:mrow><m:mn>0</m:mn></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemOBa42aa0baaSqaaiabdQgaQjabdUgaRbqaaiabicdaWaaakiabg2da9iabd6gaUnaaBaaaleaacqaIWaamcqWGQbGAaeqaaOGaey41aqRaeqiWda3aa0baaSqaaiabdQgaQjabdUgaRbqaaiabicdaWaaaaaa@3DB2@</m:annotation></m:semantics></m:math></inline-formula> and <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1742-5573-5-4-i11"><m:semantics><m:mrow><m:msubsup><m:mi>n</m:mi><m:mrow><m:mi>j</m:mi><m:mi>k</m:mi></m:mrow><m:mn>1</m:mn></m:msubsup><m:mo>=</m:mo><m:msub><m:mi>n</m:mi><m:mrow><m:mn>1</m:mn><m:mi>j</m:mi></m:mrow></m:msub><m:mo>&#215;</m:mo><m:msubsup><m:mi>&#960;</m:mi><m:mrow><m:mi>j</m:mi><m:mi>k</m:mi></m:mrow><m:mn>1</m:mn></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemOBa42aa0baaSqaaiabdQgaQjabdUgaRbqaaiabigdaXaaakiabg2da9iabd6gaUnaaBaaaleaacqaIXaqmcqWGQbGAaeqaaOGaey41aqRaeqiWda3aa0baaSqaaiabdQgaQjabdUgaRbqaaiabigdaXaaaaaa@3DB8@</m:annotation></m:semantics></m:math></inline-formula></p>
         </sec>
         <sec>
            <st>
               <p>Appendix 2: Expected numbers for Example 1</p>
            </st>
            <p>Using the notations from appendix 1, let J = 2 and K = 2,</p>
            <p><it>&#964;</it><sup><it>0</it></sup><sub><it>j </it></sub>(the Phase One proportions), <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1742-5573-5-4-i12"><m:semantics><m:mrow><m:msubsup><m:mi>&#960;</m:mi><m:mrow><m:mi>j</m:mi><m:mi>k</m:mi></m:mrow><m:mn>0</m:mn></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbwvMCKfMBHbqedmvETj2BSbqee0evGueE0jxyaibaieYdOi=BI8qipeYdI8qiW7rqqrFfpeea0xe9LqFf0xc9q8qqaqFn0dXdHiVcFbIOFHK8Feei0lXdar=Jb9qqFfeaYRXxe9vr0=vr0=LqpWqaaeaabiGaciaacaqabeaabeqacmaaaOqaaiabec8aWnaaDaaaleaacaWGQbGaam4Aaaqaaiaaicdaaaaaaa@31AF@</m:annotation></m:semantics></m:math></inline-formula> (the stratum-wise Phase Two proportions) among controls and <it>&#968;</it><sub>2 </sub>the odds ratio with MWF exposure take the values presented in Table <tblr tid="T1">1</tblr>.</p>
            <p>Then, following the formula given in appendix 1,</p>
            <p>the weighted odds-ratio in stratum 1 of non metal-workers is <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1742-5573-5-4-i13"><m:semantics><m:mrow><m:msub><m:mi>q</m:mi><m:mn>1</m:mn></m:msub><m:mo>=</m:mo><m:msubsup><m:mi>&#960;</m:mi><m:mrow><m:mn>11</m:mn></m:mrow><m:mn>0</m:mn></m:msubsup><m:mo>&#215;</m:mo><m:msub><m:mi>&#968;</m:mi><m:mn>1</m:mn></m:msub><m:mo>+</m:mo><m:msubsup><m:mi>&#960;</m:mi><m:mrow><m:mn>12</m:mn></m:mrow><m:mn>0</m:mn></m:msubsup><m:mo>&#215;</m:mo><m:msub><m:mi>&#968;</m:mi><m:mn>2</m:mn></m:msub><m:mo>=</m:mo><m:mn>0.975</m:mn><m:mo>&#215;</m:mo><m:mn>1</m:mn><m:mo>+</m:mo><m:mn>0.025</m:mn><m:mo>&#215;</m:mo><m:mn>2</m:mn><m:mo>=</m:mo><m:mn>1.025</m:mn></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyCae3aaSbaaSqaaiabigdaXaqabaGccqGH9aqpcqaHapaCdaqhaaWcbaGaeGymaeJaeGymaedabaGaeGimaadaaOGaey41aqRaeqiYdK3aaSbaaSqaaiabigdaXaqabaGccqGHRaWkcqaHapaCdaqhaaWcbaGaeGymaeJaeGOmaidabaGaeGimaadaaOGaey41aqRaeqiYdK3aaSbaaSqaaiabikdaYaqabaGccqGH9aqpcqaIWaamcqGGUaGlcqaI5aqocqaI3aWncqaI1aqncqGHxdaTcqaIXaqmcqGHRaWkcqaIWaamcqGGUaGlcqaIWaamcqaIYaGmcqaI1aqncqGHxdaTcqaIYaGmcqGH9aqpcqaIXaqmcqGGUaGlcqaIWaamcqaIYaGmcqaI1aqnaaa@5B06@</m:annotation></m:semantics></m:math></inline-formula></p>
            <p>the weighted odds-ratio in stratum 2 of metal-workers is <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1742-5573-5-4-i14"><m:semantics><m:mrow><m:msub><m:mi>q</m:mi><m:mn>2</m:mn></m:msub><m:mo>=</m:mo><m:msubsup><m:mi>&#960;</m:mi><m:mrow><m:mn>21</m:mn></m:mrow><m:mn>0</m:mn></m:msubsup><m:mo>&#215;</m:mo><m:msub><m:mi>&#968;</m:mi><m:mn>1</m:mn></m:msub><m:mo>+</m:mo><m:msubsup><m:mi>&#960;</m:mi><m:mrow><m:mn>22</m:mn></m:mrow><m:mn>0</m:mn></m:msubsup><m:mo>&#215;</m:mo><m:msub><m:mi>&#968;</m:mi><m:mn>2</m:mn></m:msub><m:mo>=</m:mo><m:mn>0.75</m:mn><m:mo>&#215;</m:mo><m:mn>1</m:mn><m:mo>+</m:mo><m:mn>0.25</m:mn><m:mo>&#215;</m:mo><m:mn>2</m:mn><m:mo>=</m:mo><m:mn>1.25</m:mn></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyCae3aaSbaaSqaaiabikdaYaqabaGccqGH9aqpcqaHapaCdaqhaaWcbaGaeGOmaiJaeGymaedabaGaeGimaadaaOGaey41aqRaeqiYdK3aaSbaaSqaaiabigdaXaqabaGccqGHRaWkcqaHapaCdaqhaaWcbaGaeGOmaiJaeGOmaidabaGaeGimaadaaOGaey41aqRaeqiYdK3aaSbaaSqaaiabikdaYaqabaGccqGH9aqpcqaIWaamcqGGUaGlcqaI3aWncqaI1aqncqGHxdaTcqaIXaqmcqGHRaWkcqaIWaamcqGGUaGlcqaIYaGmcqaI1aqncqGHxdaTcqaIYaGmcqGH9aqpcqaIXaqmcqGGUaGlcqaIYaGmcqaI1aqnaaa@5830@</m:annotation></m:semantics></m:math></inline-formula></p>
            <p>From this, we obtain the Phase Two proportions of metal-fluid exposure (k = 2);</p>
            <p>In stratum 1 of non-metal working cases: <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1742-5573-5-4-i15"><m:semantics><m:mrow><m:msubsup><m:mi>&#960;</m:mi><m:mrow><m:mn>12</m:mn></m:mrow><m:mn>1</m:mn></m:msubsup><m:mo>=</m:mo><m:msubsup><m:mi>&#960;</m:mi><m:mrow><m:mn>12</m:mn></m:mrow><m:mn>0</m:mn></m:msubsup><m:mo>&#215;</m:mo><m:mfrac><m:mrow><m:msub><m:mi>&#968;</m:mi><m:mn>1</m:mn></m:msub></m:mrow><m:mrow><m:msub><m:mi>q</m:mi><m:mn>1</m:mn></m:msub></m:mrow></m:mfrac><m:mo>=</m:mo><m:mn>0.025</m:mn><m:mo>&#215;</m:mo><m:mfrac><m:mn>2</m:mn><m:mrow><m:mn>1.025</m:mn></m:mrow></m:mfrac><m:mo>=</m:mo><m:mn>4.9</m:mn><m:mi>%</m:mi></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqiWda3aa0baaSqaaiabigdaXiabikdaYaqaaiabigdaXaaakiabg2da9iabec8aWnaaDaaaleaacqaIXaqmcqaIYaGmaeaacqaIWaamaaGccqGHxdaTjuaGdaWcaaqaaiabeI8a5naaBaaabaGaeGymaedabeaaaeaacqWGXbqCdaWgaaqaaiabigdaXaqabaaaaOGaeyypa0JaeGimaaJaeiOla4IaeGimaaJaeGOmaiJaeGynauJaey41aqBcfa4aaSaaaeaacqaIYaGmaeaacqaIXaqmcqGGUaGlcqaIWaamcqaIYaGmcqaI1aqnaaGccqGH9aqpcqaI0aancqGGUaGlcqaI5aqocqGGLaqjaaa@513E@</m:annotation></m:semantics></m:math></inline-formula></p>
            <p>In stratum 2 of metal-working cases: <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1742-5573-5-4-i16"><m:semantics><m:mrow><m:msubsup><m:mi>&#960;</m:mi><m:mrow><m:mn>22</m:mn></m:mrow><m:mn>1</m:mn></m:msubsup><m:mo>=</m:mo><m:msubsup><m:mi>&#960;</m:mi><m:mrow><m:mn>22</m:mn></m:mrow><m:mn>0</m:mn></m:msubsup><m:mo>&#215;</m:mo><m:mfrac><m:mrow><m:msub><m:mi>&#968;</m:mi><m:mn>2</m:mn></m:msub></m:mrow><m:mrow><m:msub><m:mi>q</m:mi><m:mn>2</m:mn></m:msub></m:mrow></m:mfrac><m:mo>=</m:mo><m:mn>0.25</m:mn><m:mo>&#215;</m:mo><m:mfrac><m:mn>2</m:mn><m:mrow><m:mn>1.25</m:mn></m:mrow></m:mfrac><m:mo>=</m:mo><m:mn>40</m:mn><m:mi>%</m:mi></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqiWda3aa0baaSqaaiabikdaYiabikdaYaqaaiabigdaXaaakiabg2da9iabec8aWnaaDaaaleaacqaIYaGmcqaIYaGmaeaacqaIWaamaaGccqGHxdaTjuaGdaWcaaqaaiabeI8a5naaBaaabaGaeGOmaidabeaaaeaacqWGXbqCdaWgaaqaaiabikdaYaqabaaaaOGaeyypa0JaeGimaaJaeiOla4IaeGOmaiJaeGynauJaey41aqBcfa4aaSaaaeaacqaIYaGmaeaacqaIXaqmcqGGUaGlcqaIYaGmcqaI1aqnaaGccqGH9aqpcqaI0aancqaIWaamcqGGLaqjaaa@4E74@</m:annotation></m:semantics></m:math></inline-formula></p>
            <p>The Phase One proportion of metal-workers among cases is <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1742-5573-5-4-i17"><m:semantics><m:mrow><m:msubsup><m:mi>&#964;</m:mi><m:mn>2</m:mn><m:mn>1</m:mn></m:msubsup><m:mo>=</m:mo><m:mfrac><m:mrow><m:msubsup><m:mi>&#964;</m:mi><m:mn>2</m:mn><m:mn>0</m:mn></m:msubsup><m:mo>&#215;</m:mo><m:msub><m:mi>q</m:mi><m:mn>2</m:mn></m:msub></m:mrow><m:mrow><m:mstyle displaystyle="true"><m:munder><m:mo>&#8721;</m:mo><m:mi>j</m:mi></m:munder><m:mrow><m:msubsup><m:mi>&#964;</m:mi><m:mi>j</m:mi><m:mn>0</m:mn></m:msubsup><m:mo>&#215;</m:mo><m:msub><m:mi>q</m:mi><m:mi>j</m:mi></m:msub></m:mrow></m:mstyle></m:mrow></m:mfrac><m:mo>=</m:mo><m:mfrac><m:mrow><m:mn>0.20</m:mn><m:mo>&#215;</m:mo><m:mn>1.25</m:mn></m:mrow><m:mrow><m:mn>0.20</m:mn><m:mo>&#215;</m:mo><m:mn>1.25</m:mn><m:mo>+</m:mo><m:mn>0.80</m:mn><m:mo>&#215;</m:mo><m:mn>1.025</m:mn></m:mrow></m:mfrac><m:mo>=</m:mo><m:mn>0.234</m:mn></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqiXdq3aa0baaSqaaiabikdaYaqaaiabigdaXaaakiabg2da9KqbaoaalaaabaGaeqiXdq3aa0baaeaacqaIYaGmaeaacqaIWaamaaGaey41aqRaemyCae3aaSbaaeaacqaIYaGmaeqaaaqaamaaqafabaGaeqiXdq3aa0baaeaacqWGQbGAaeaacqaIWaamaaGaey41aqRaemyCae3aaSbaaeaacqWGQbGAaeqaaaqaaiabdQgaQbqabiabggHiLdaaaOGaeyypa0tcfa4aaSaaaeaacqaIWaamcqGGUaGlcqaIYaGmcqaIWaamcqGHxdaTcqaIXaqmcqGGUaGlcqaIYaGmcqaI1aqnaeaacqaIWaamcqGGUaGlcqaIYaGmcqaIWaamcqGHxdaTcqaIXaqmcqGGUaGlcqaIYaGmcqaI1aqncqGHRaWkcqaIWaamcqGGUaGlcqaI4aaocqaIWaamcqGHxdaTcqaIXaqmcqGGUaGlcqaIWaamcqaIYaGmcqaI1aqnaaGccqGH9aqpcqaIWaamcqGGUaGlcqaIYaGmcqaIZaWmcqaI0aanaaa@6C2A@</m:annotation></m:semantics></m:math></inline-formula></p>
            <p>From these quantities, the expected numbers can be derived for a given design as illustrated in table <tblr tid="T2">2</tblr></p>
         </sec>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This work was funded by the National Cancer Institute, NIH (Intramural Research Program to A.G.) and the Deutsche Forschungsgemeinschaft (PI 345/1&#8211;2 to W.S.)</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Logistic regression analysis and efficient design for two-stage studies</p>
            </title>
            <aug>
               <au>
                  <snm>Cain</snm>
                  <fnm>KC</fnm>
               </au>
               <au>
                  <snm>Breslow</snm>
                  <fnm>NE</fnm>
               </au>
            </aug>
            <source>Am J Epidemiol</source>
            <pubdate>1988</pubdate>
            <volume>128</volume>
            <issue>6</issue>
            <fpage>1198</fpage>
            <lpage>1206</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">3195561</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Logistic regression for two-stage case-control data</p>
            </title>
            <aug>
               <au>
                  <snm>Breslow</snm>
                  <fnm>NE</fnm>
               </au>
               <au>
                  <snm>Cain</snm>
                  <fnm>KC</fnm>
               </au>
            </aug>
            <source>Biometrika</source>
            <pubdate>1988</pubdate>
            <volume>75</volume>
            <fpage>11</fpage>
            <lpage>20</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1093/biomet/75.1.11</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>A two stage design for the study of the relationship between a rare exposure and a rare disease</p>
            </title>
            <aug>
               <au>
                  <snm>White</snm>
                  <fnm>JE</fnm>
               </au>
            </aug>
            <source>Am J Epidemiol</source>
            <pubdate>1982</pubdate>
            <volume>115</volume>
            <fpage>119</fpage>
            <lpage>128</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">7055123</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Anamorphic analysis: Sampling and estimation for covariate effect when both exposure and disease are known</p>
            </title>
            <aug>
               <au>
                  <snm>Walker</snm>
                  <fnm>AM</fnm>
               </au>
            </aug>
            <source>Biometrics</source>
            <pubdate>1982</pubdate>
            <volume>38</volume>
            <fpage>1025</fpage>
            <lpage>32</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.2307/2529883</pubid>
                  <pubid idtype="pmpid">7168792</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Logistic analysis of studies with two-stage sampling: a comparison of four approaches</p>
            </title>
            <aug>
               <au>
                  <snm>Schill</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Drescher</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Stat Med</source>
            <pubdate>1997</pubdate>
            <volume>16</volume>
            <fpage>117</fpage>
            <lpage>32</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/(SICI)1097-0258(19970130)16:2&lt;117::AID-SIM475>3.0.CO;2-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">9004387</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Asbestos fibreyears and lung cancer: a two phase case-control study with expert exposure assessment</p>
            </title>
            <aug>
               <au>
                  <snm>Pohlabeln</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Wild</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Schill</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Ahrens</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Jahn</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Bolm-Audorff</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>J&#246;ckel</snm>
                  <fnm>KH</fnm>
               </au>
            </aug>
            <source>Occup Environ Med</source>
            <pubdate>2002</pubdate>
            <volume>59</volume>
            <fpage>410</fpage>
            <lpage>4</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1740301</pubid>
                  <pubid idtype="pmpid">12040118</pubid>
                  <pubid idtype="doi">10.1136/oem.59.6.410</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Optimal sampling strategies for two-stage studies</p>
            </title>
            <aug>
               <au>
                  <snm>Reilly</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Am J Epidemiol</source>
            <pubdate>1996</pubdate>
            <volume>143</volume>
            <fpage>92</fpage>
            <lpage>100</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8533752</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Optimal design and efficiency of two-phase case-control studies with error-prone and error-free exposure measures</p>
            </title>
            <aug>
               <au>
                  <snm>McNamee</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Biostatistics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>590</fpage>
            <lpage>603</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/biostatistics/kxi029</pubid>
                  <pubid idtype="pmpid" link="fulltext">15860543</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Minmax designs for planning the second Phase of a two-phase case-control study</p>
            </title>
            <aug>
               <au>
                  <snm>Schill</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Wild</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Stat Med</source>
            <pubdate>2006</pubdate>
            <volume>25</volume>
            <fpage>1646</fpage>
            <lpage>59</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/sim.2307</pubid>
                  <pubid idtype="pmpid" link="fulltext">16158403</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Re: "flexible matching strategies to increase power and efficiency to detect and estimate gene-environment interactions in case-control studies"</p>
            </title>
            <aug>
               <au>
                  <snm>Schill</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Wild</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Am J Epidemiol</source>
            <pubdate>2004</pubdate>
            <volume>159</volume>
            <fpage>1107</fpage>
            <lpage>8</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/aje/kwh144</pubid>
                  <pubid idtype="pmpid" link="fulltext">15155296</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Cancer risks among workers exposed to metalworking fluids: a systematic review</p>
            </title>
            <aug>
               <au>
                  <snm>Calvert</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Ward</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Schnorr</snm>
                  <fnm>TM</fnm>
               </au>
               <etal/>
            </aug>
            <source>Am J Ind Med</source>
            <pubdate>1998</pubdate>
            <volume>33</volume>
            <fpage>282</fpage>
            <lpage>92</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/(SICI)1097-0274(199803)33:3&lt;282::AID-AJIM10>3.0.CO;2-W</pubid>
                  <pubid idtype="pmpid" link="fulltext">9481427</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Design and analysis of two-phase studies with binary outcome applied to Wilms tumour prognosis</p>
            </title>
            <aug>
               <au>
                  <snm>Breslow</snm>
                  <fnm>NE</fnm>
               </au>
               <au>
                  <snm>Chatterjee</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Applied Statistics</source>
            <pubdate>1999</pubdate>
            <volume>48</volume>
            <fpage>457</fpage>
            <lpage>468</lpage>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Randomized recruitment in case-control studies</p>
            </title>
            <aug>
               <au>
                  <snm>Weinberg</snm>
                  <fnm>CR</fnm>
               </au>
               <au>
                  <snm>Sandler</snm>
                  <fnm>DP</fnm>
               </au>
            </aug>
            <source>Am J Epidemiol</source>
            <pubdate>1991</pubdate>
            <volume>134</volume>
            <fpage>421</fpage>
            <lpage>32</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">1877602</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Counter-matching in studies of gene-environment interaction: efficiency and feasibility</p>
            </title>
            <aug>
               <au>
                  <snm>Andrieu</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Goldstein</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Thomas</snm>
                  <fnm>DC</fnm>
               </au>
               <au>
                  <snm>Langholz</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Am J Epidemiol</source>
            <pubdate>2001</pubdate>
            <volume>153</volume>
            <fpage>265</fpage>
            <lpage>74</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/aje/153.3.265</pubid>
                  <pubid idtype="pmpid" link="fulltext">11157414</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>A prospective study of genetic polymorphism in MPO, antioxidant status, and breast cancer risk</p>
            </title>
            <aug>
               <au>
                  <snm>He</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Tamimi</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Hankinson</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Hunter</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Han</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Breast Cancer Res Treat</source>
            <pubdate>2008</pubdate>
            <inpress/>
            <note>2008 Mar 14</note>
         </bibl>
      </refgrp>
   </bm>
</art>
