Abstract
Background
A discrete choice experiment (DCE) is a preference survey which asks participants to make a choice among product portfolios comparing the key product characteristics by performing several choice tasks. Analyzing DCE data needs to account for withinparticipant correlation because choices from the same participant are likely to be similar. In this study, we empirically compared some commonlyused statistical methods for analyzing DCE data while accounting for withinparticipant correlation based on a survey of patient preference for colorectal cancer (CRC) screening tests conducted in Hamilton, Ontario, Canada in 2002.
Methods
A twostage DCE design was used to investigate the impact of six attributes on participants' preferences for CRC screening test and willingness to undertake the test. We compared six models for clustered binary outcomes (logistic and probit regressions using clusterrobust standard error (SE), randomeffects and generalized estimating equation approaches) and three models for clustered nominal outcomes (multinomial logistic and probit regressions with clusterrobust SE and randomeffects multinomial logistic model). We also fitted a bivariate probit model with clusterrobust SE treating the choices from two stages as two correlated binary outcomes. The rank of relative importance between attributes and the estimates of β coefficient within attributes were used to assess the model robustness.
Results
In total 468 participants with each completing 10 choices were analyzed. Similar results were reported for the rank of relative importance and β coefficients across models for stageone data on evaluating participants' preferences for the test. The six attributes ranked from high to low as follows: cost, specificity, process, sensitivity, preparation and pain. However, the results differed across models for stagetwo data on evaluating participants' willingness to undertake the tests. Little withinpatient correlation (ICC ≈ 0) was found in stageone data, but substantial withinpatient correlation existed (ICC = 0.659) in stagetwo data.
Conclusions
When small clustering effect presented in DCE data, results remained robust across statistical models. However, results varied when larger clustering effect presented. Therefore, it is important to assess the robustness of the estimates via sensitivity analysis using different models for analyzing clustered data from DCE studies.
Keywords:
Discrete choice experiment; Intraclass correlation; Statistical model; Patient preferenceBackground
With increased emphasis on the role of patients in healthcare decision making, discrete choice experimental (DCE) designs are more often used to elicit patient preferences among proposed health services programs [1,2]. DCE is an attributebased design drawn from Lancaster's economic theory of consumer behaviour [3] and the statistical principles of the design of experiments [4]. This method measures consumer preference according to McFadden's random utility (benefit) maximisation (RUM) framework amongst a choice set which contains two or more alternatives of products or goods varying along several characteristics (attributes) of interest. In the early 1980s, Louviere, Hensher and Woodworth [5,6] introduced DCE into marketing research, and since then DCE has been rapidly adopted by researchers in other areas such as transportation, environment and social science. Its applications in health research emerged in the early 1990s, and it has been increasingly used to evaluate patient preferences for currently available and newlyproposed health services or programs in health economics and policymaking related topics. For example, in the health economics related research area, 34 published studies used DCE design in the period from 1990 to 2000, and 114 DCE design studies were published in the period from 2001 to 2008 [7].
In the short history of using DCE in health research, there were several reviews [79], and debates about methodological and design issues, challenges and future development [1012]. In generating a DCE study, three major formats of the choice design have frequently been used: i) a forced choice between two alternatives, ii) a choice among three or more alternatives with an optout option, and iii) a twostaged choice process which forces participants to choose one of the alternatives and then an optout choice is provided to allow participants to say no to all proposed products [13]. Despite the rapid developments in design aspects [12,14], less attention was paid to the statistical analysis and model selection issues. Lancaster and Louviere [15] and Ryan and et al. [13] discussed several statistical models used for DCE including multinomial logistic model (MNL), multinomial probit model (MNP), and mixed logit model (MIXL). However, these studies did not provide detailed comparisons amongst competing models, or a clear indication of how to best deal with model selection issues. Another aspect related to the analysis of DCE data is adjustment for clustering effects. For example, in the DCE survey, it is common to ask participants to respond to several choice tasks in one survey. Each choice task has the same format but different attribute combinations. Naturally the choices made by same person would be expected to be more similar than the choices of other persons, leading to the withinpatient correlation of responses. This withinsubject correlation caused by the clustering effects or repeated observations needs to be accounted for in the analysis [16]. It is often measured using the intraclass correlation coefficient (ICC) where ICC = 0 indicates no intraperson correlation and ICC = 1 indicates perfect intraperson correlation. In this paper, we empirically compared some commonlyused statistical models which also account for the clustering effects in DCE analysis. We assessed the robustness (consistency and discrepancy) of the models on ranking of the relative importance between the attributes and the estimates of the β coefficients within each level of the attributes.
The data we used were taken from the preference survey on colorectal cancer (CRC) screening tests conducted in Hamilton, Ontario, Canada in 2002 [17]. This project used a twolevel choice design. Thus, the data structure allowed us to investigate the statistical models for analyzing binary, nominal and bivariate outcomes for DCE data.
Methods
Overview of the CRC screening project
The Canadian Cancer Society reported in 2011 that CRC is the fourth most commonly diagnosed cancer and the second leading cause of cancer death in Canada [18]. According to the same report, the estimates of new cases of CRC and CRC related death in 2011 were 22,200 (50 per 100,000 person) and 8,900 (20 per 100,000 persons) in 2011. Although CRC has a high incidence rate, patients have a better chance of successful treatment if diagnosis can be made earlier. Although a populationbased CRC screening program is highly recommended for people over 50 years of age [19,20], the uptake rate in North America is only about 50% [21]. Therefore, better understanding of patient preferences for screening tests may be the key to the successful implementation and uptake of CRC screening programs. This survey was the first conducted in Canada to evaluate patient preferences for various CRC screening tests to identify the key attributes and levels that may influence CRC screening test uptake.
Traditional CRC screening modalities such as fecal occult blood testing (FOBT), flexible sigmoidoscopy (SIG), colonoscopy (COL) and doublecontrast barium enema (DCBE) vary on their process, accuracy, comfort and cost [22]. In this survey, five important attributes of features of the screening tests were identified through review of the literature, consultation with clinical specialists and patient focus groups. They were: process (4 levels), pain (2 levels), preparation (3 levels), specificity (3 levels) and sensitivity (3 levels). In addition, cost (4 levels) was included due to its potential influence on the uptake (Table 1). To reduce the burden on respondents for making their choices on 864 (4 × 2 × 3 × 3 × 3 × 4) unique combination from full factorial design, we used a fractional factorial design. In this design, 40 choice tasks were divided into four blocks to create a subset of 10 choice tasks of the attribute combinations for each survey participant to evaluate. The original design was developed using the SAS Optex procedure and optimized several measures of efficiency: 1) level balance; 2) orthogonality; and 3) Defficiency [17,23]. This design ensured the ability of estimating the main effects of the attributes while minimizing the number of combinations. No prior information on the ranking of attributes from the literature was available at the time of the design of the study. The survey used the pairwise binary twostage response design [24] with the choice between two choice sets of the attributes at different levels as the first step and the addition of an optout option as the second step (Table 2). This design maximized the information gained through the questionnaire to understand patient preferences on the CRC screening tests and the factors affecting the uptake rate. However, the analysis presented challenges. First, the answers were likely to cluster within subjects because each subject made two sequential choices for ten choice tasks. Therefore, a statistical model adjusting for withinsubject correlation for repeated measurements was needed. Second, in the original paper, the analysis was done using the bivariate probit model, but the analysis could be approached using different methods: treating the responses at the two stages as independent responses, as sequential and correlated bivariate responses, or as a single response with three levels (Test A, Test B or No screening).
Outcomes
According to the unique data structure of the twostage design, we conducted three analytic approaches. 1) Analyze the twostaged sequential choices of each choice task separately, i.e. binary outcomes: a) subject preferences on the screening modalities which only included patient responses at the first stage, and b) subject willingness to participate in the screening program which only included subjects' responses at the second stage. 2) Treat the twostaged data as paralleled threechoice options including Test A, Test B and "optout", i.e. nominal data. 3) Treat the twostaged data as two correlated binary choice sets, i.e. bivariate outcomes. Figure 1 presents the data structure of the original design and these three analysis approaches.
Figure 1. Twostage design and outcomes for analysis.
Random utility theory
As mentioned above, the DCE design is generally based on random utility theory [25] which expresses the utility (benefit) U_{in }of an alternative i in a choice set C_{n }(perceived by individual n) as two parts: 1) an explainable component specified as a function of the attributes of the alternatives V(X_{in}, β); and 2) an unexplainable component (random variation) ε_{in}.
The individual n will choose alternative i over other alternatives if and only if this alternative gives the maximized utility. The relationship of the utility function and the observed k attributes of the alternatives can be assumed under a linearinparameter function.
According to the assumption of the distribution of the error term ε_{in}, the models specification of DCE data can be varied.
Statistical methods
The statistical models discussed in this paper were organized according to the type of outcomes: i) logistic and probit models for binary outcomes, ii) multinomial logistic and probit models for nominal outcomes, and iii) bivariate probit model for bivariate binary outcomes. We provide some details on how the different statistical techniques account for the withincluster correlation in analyzing clustered DCE data.
For the binary type of outcomes, we examined six statistical models which have the capacity to account for the withinpatients correlations [26,27], including logistic regression with clustered robust standard error, randomeffects logistic regression, logistic model using generalized estimating equations (GEE), probit regression with clustered robust standard error, randomeffects probit regression, and probit regression using generalized estimating equation (GEE) model. Below are some brief descriptions of the methods.
Standard logistic regression and standard probit regression
Both standard logistic and probit regressions assume that the observations are independent. However in our dataset, each subject completed ten choice tasks, i.e. each subject had ten observations (choice tasks) which formed a cluster or can be considered repeated measurement. Normally, the observations in the same clusters are more similar (correlated) comparing to the observations out of the cluster. Therefore, adjusting the correlation within the cluster is necessary. We used three methods to adjust the withincluster correlation.
Clustered robust standard error
In this method the independence assumptions are relaxed among all observations, but it is assumed that the observations across clusters are independent. The total variance is empirically estimated using HuberWhite (also called Sandwich) standard error [28]. This method takes only the intraclass correlation into account, but the degrees of freedom are still based on the number of observations, not the number of clusters [29]. Therefore, this method only adjusts the standard error related to the confidence interval, but the point estimates are left unchanged.
Randomeffects method
In this method, the total variance has two components: betweencluster variance and withincluster variance. We assume that, at the cluster level, data follow a normal distribution with mean zero and betweencluster variance τ^{2}; and that within each cluster, data vary according to some withincluster variance [30]. This method takes two types of variance into account when estimating the total variance and the degrees of freedom are calculated based on the number of clusters [31]. Therefore, the point estimates and their corresponding variances are adjusted for intracluster correlation. For the covariance structure, we assumed equal variances for the random effects and a common pairwise covariance [32]. This structure corresponds to the exchangeable correlation structure specified for GEE method, which we describe below. The key difference between the randomeffects method and other methods discussed here is that the randomeffects method estimates the parameters for each subject within cluster or clusters sharing the same random effects. Therefore, the random effect is also often called subject specific effect [33].
GEE method
This method allows a working correlation matrix to be specified to adjust the withincluster correlation. We assumed that there was no ordering effect among the observation in each cluster, allowing us to use an exchangeable correlation matrix [34]. As in the randomeffects method, the degrees of freedom are based on the number of clusters, which in turn adjusts the estimate of the confidence interval [35]. Unlike the randomeffects method, the GEE approach estimates the regression parameters averaging over the clusters (socalled population average model) [36].
For the nominal type of outcomes, we used three statistical models [37]: multinomial logistic model with clustered robust standard error, randomeffects multinomial logistic model, and multinomial probit model with clustered robust standard error. We also fitted a bivariate probit model in which the choices from two stages were treated as two binary outcomes [38].
Multinomial logistic model
McFadden's conditional logit model (CLM), also called multinomial logistic (MNL) model, was the pioneer and most commonly used model in the early DCE studies [39]. The key assumption of this model is that the error terms ε_{in }are independent and identically distributed (IID) [13], which leads to the independence of irrelevant alternatives (IIA) property [40]. Another assumption for this model is that the error term has an extreme value distribution with mean 0 and variance π^{2}/6 [37]. To take the intraclass correlation into account, the clustered robust SE was used.
Randomeffects multinomial logistic model
Similar to the randomeffects models used for analyzing binary outcomes, this model takes two levels of variance, betweencluster variance and withincluster variance, into account for clustered or longitudinal nominal responses [41,42].
Multinomial probit model
Multinomial probit model (MNP) (heteroscedastic models) is considered to be one of the most robust, flexible and general models in DCE, especially when the correlation (heteroscedasticity) between alternatives is presented [43]. The model is assumed to have a normally distributed error term. The benefit of using MNP model is that the IIA assumption which is the strict requirement for MNL model can be somehow relaxed [37]. The main concern in using this model is that its maximization involves Monte Carlo simulation but not the analytical maximization which could lead to a computational burden. Again, the clustered robust SE was used to incorporate the intraclass correlation.
Bivariate probit model
In this model, we assume that the choices between two stages (stage 1: choice between screening test; stage 2: choice between participation and optout) are not independent. It says that subject choice as to whether or not to participate in the screening program was conditional on subject preference for the screening modalities [44]. By fitting this model, two types of correlation can be taken into account: the correlation between the outcomes from stage 1 and stage 2, incorporated through the bivariate nature of the model itself, and the intraclass correlation, incorporated through use of the cluster robust SE.
To assess the necessity of accounting for the intraclass correlation for analyzing clustered correlated DCE data, we also presented the results from the above models using simple standard error (SE)which does not take clustering into account. They are the standard logistic, probit, multinomial logistic, multinomial probit and bivariate probit models.
We compared results from the above models on the following criteria: rank on the relative importance of the attributes, and magnitude, direction and significance of the estimates of the β coefficient within each level of the attributes, which were obtained by regressing preference onto the difference in attributes between the two choices. The ranking criterion was measured by the percent change between the loglikelihood value of the full model and the value after removing one specific attribute from the model [45]. To evaluate the significance of the estimate of the β coefficients within each attribute, the criterion for statistical significance was set at alpha = 0.05. All statistical models were conducted using STATA 10.2 (College Station TX) and the figures were plotted using PASW Statistics 19 (SPSS: An IBM Company).
Results
A random sample of 1,170 patients was selected from a roster of 9,959 patients aged 4060 years from the Hamilton Primary Care Network. After excluding the patients who did not pass the inclusion criteria, questionnaires were mailed to 1,049 patients. Of these, 547 were returned and 485 had complete data. Among the patients with complete data, we excluded 17 patients who did not pass the rationale test, which were two warmup choice tasks. For these warmup tasks, one alternative was dominant over another possessing all favourable attribute levels and the respondents who did not choose the dominant alternative were considered to have failed the rational test. Finally, we analyzed the data for 468 patients (Figure 2) from four blocks with the block size of 105, 124, 120 and 119 respectively.
Figure 2. Flow chart of sample selection.
The mean age of the subjects was 50.8 years (standard deviation, 5.95 years), which was similar to the recommended age to start CRC screening [46]. Of the 468 included subjects, about 48% were female, 12% had family history of CRC and two patients (0.2%) had been diagnosed with CRC. The detailed demographic characteristics are presented in Table 3.
Table 3. Demographic characteristic of respondents
For the twopoint outcomes (binary), the rank of the attributes on the choice of Test A and Test B was consistent across models. From most important to least important, they ranked as follows: cost, specificity, process, sensitivity, preparation and pain (Figure 3). With the exception of the randomeffects logistic and probit models, the ranking (from most important to least important) of the six attributes for assessing participation or optout (stagetwo), was as follows: cost, sensitivity, preparation, process, specificity and pain. The ranking from randomeffects models was: cost, sensitivity, process, specificity, preparation and pain (Figure 4). For the threepoint outcomes (nominal and bivariate) in which the choices of Test A, Test B and optout were estimated simultaneously, the attributes were ranked consistently: cost, sensitivity, specificity, process, preparation and pain (Figure 5). Comparing to the models using simple SE, using clustered robust SE to incorporate intraclass correlation did have any effects on calculating the relative importance of attributes.
Figure 3. Relative importance of choice between Test A and Test B (stageone).
Figure 4. Relative importance of choice between participation and optout (stagetwo).
Figure 5. Relative importance of choice between Test A, Test B and optout (combined stage one and two).
When looking at how certain levels of each attribute affected the choice between Test A and Test B (stageone), the estimates of the β coefficients were similar in magnitude and direction across different statistical models. The most preferred screening test had the following features: stool sample, no preparation, 100% specificity, 70% sensitivity, without pain and with an associated cost of $50. The least preferred screening test had the combination of colonoscopy, special diet for preparation, 80% specificity, 90% sensitivity, with mild pain and no associated cost (Table 4 and Table 5).
Table 4. Estimates of coefficients of patient choice between Test A and Test B (Twopoint outcome from stageone)
Table 5. Estimates of coefficients of patient choice of Test A and Test B (Stageone from threepoint outcome)
When assessing the impact of certain levels of each attribute on patient choice of participating or optout (stagetwo), the β coefficient estimates for 90% sensitivity and no preparation had a significantly positive effect on uptake and this was consistent across all models. For other attributes and levels, results appeared similar across all three global analysis approaches: the randomeffects and GEE logistic models and the randomeffects and GEE probit models (Table 6); MNL with clustered robust SE, MNL randomeffects and MNP with clustered robust SE (Table 7); and logistic with clustered robust SE, probit with clustered robust SE and bivariate probit (Table 6 and Table 7). The following two examples showed the estimates across models could differ by magnitude and direction. The magnitude of estimates of the effect of 90% sensitivity varied by model, but the direction was similar across all models. When comparing the cost of $50 to no cost, logistic and probit randomeffects and GEE models reported that participants preferred no cost. MNL with clustered robust SE, MNL randomeffects and MNP with clustered robust SE model reported that participants preferred the $50 cost. For other models, no significant statistical differences were found (Figure 6). We also found that unlike the results from the stageone data (Table 4 and Table 5), for the stagetwo data there was noticeable difference between the β coefficient estimates from the models with and without incorporating the intraclass correlation (Table 6 and Table 7).
Table 6. Estimates of coefficients of patient choice of participation or optout (Twopoint outcome from stagetwo)
Table 7. Estimates of coefficients of patient choice of participation or optout (Stagetwo from threepoint outcome)
Figure 6. β Coefficients with 95% CI (Cost: $50 vs. No cost) of patient choice between participation and optout.
When assessing the clustering effect, we found that intraclass correlation was small among the stageone data (ICC ≈ 0) and relatively large among the stagetwo data (ICC = 0.659). For this survey, it appears as though many patients had predetermined their participation for CRC screening. For example, among the 468 participants included in the analyses, 48% always chose to undertake the screening program and 15% always chose no participation regardless of how the screening modalities varied at the first stage. Although Test A and Test B were generic terms of the combinations of the different levels of six attributes and they were randomly assigned to appear first or second in one choice task, we found that 24% more participants chose Test A over Test B. All the design limitations had some impact on our interpretation of the analysis results.
Discussion
We applied six statistical models to binary outcomes, three models to nominal multinomial outcomes and one model to bivariate binary outcomes to estimate the ranking of key attributes of CRC screening tests using data from DCE survey conducted in Hamilton, Ontario, Canada in 2002. We used three methods to adjust the withincluster correlations: clustered robust standard error, randomeffects, and GEE methods. The results showed consistent answers for estimating subject preference for CRC screening tests, both on ranking the importance of the attributes and identifying the significant factors influencing subject choice between testing modalities. For estimating subject willingness to participate or undertake CRC screening (i.e. incorporating "output" option), models disagreed both on ranking the importance of the attributes and identifying the significant factors (i.e. attributes and levels) affecting whether or not subjects would participate.
Overall, our analyses showed that participants preferred a CRC screening test with the following characteristics: stool sample, no preparation, 100% specificity, 70% sensitivity and without pain. The CRC test with such a combination of attribute levels would be the FOBT test [18]. Thus, our findings appear to be consistent with the results from Nelson and Schwartz's survey in 2004 [47] which showed FOBT to be the most preferred option for CRC screening. In that survey, they also reviewed 12 previous studies, all of which showed FOBT to be a preferred choice by most patients.
The reason for the consistency in estimating the choice between screening tests and the discrepancy in estimating the choice between participation and "output" might be due to the model's ability to adjust the withinparticipant (cluster) correlation. When the withincluster correlation is small (choice between Test A and Test B), the assumption of the independently and identically distributed error term ε_{in }is held. Therefore, it might not be necessary to take the clustering effects into account and thus the estimates are similar across statistical models. However, when the intraclass correlation presents, the analysis needs to account for both the withincluster variance and betweencluster variance [48].
To the best of our knowledge, this is the first empirical study to compare different methods to address the withinparticipant correlation in the analysis of DCE data. However, many authors have emphasized the importance of adjusting for clustering in analysis of clustered data or repeated measurements for binary outcomes [49,50]. When intraclass correlations are present in clustered or longitudinal data, the randomeffects and GEE models are two commonly recommended approaches. Although they are estimating different parameters (the estimates from randomeffects model are interpreted for the observations in the same cluster; the estimates from GEE model are interpreted as the mean across entire sample), the results from these two models are similar most of the time [41,51]. Some researchers generally prefer randomeffects model when the results from these two approaches disagree. However, some researchers argue that the randomeffects model could provide biased results due to unverifiable assumptions about the data distribution [52].
Comparing to the models for analyzing correlated binary data, statistical software seldom has readytouse statistical models developed for multinomial outcomes or multivariate outcomes. The multinomial probit model is routinely used to deal with correlation between alternatives [53], but it does not take intraclass or intrarespondent correlation into account. Robust standard error can be specified for multinomial logistic or probit and bivariate logistic models to adjust the estimate of standard error, but this would not correct the bias related to point estimates (coefficients). A simulation study has shown that the bias and the inconsistency for estimating the withincluster correlation increase with the size of the cluster [54]. The newly developed generalized linear latent and mixed model (gllamm) procedure in STATA has the ability to run randomeffects multinomial logistic model [55] to address the intraclass correlation issue, but this model has yet to be evaluated for performance (i.e. whether or not yields unbiased estimates). Some researchers have suggested using Bayesian hierarchical randomeffects logistic and probit regression for clustered or panel data [56]. Although the Bayesian approach allows the flexibility to specify random effects, it requires considerable skill in programming.
This study has some limitations. First, this study is an empirical comparison of the analytic models and therefore we cannot know which model performs the best. Such an analysis would require simulation studies to assess the performance of the models in terms of the bias, precision, and coverage. Second, some estimates of the cost attribute in our study were inexplicable. For the test associated cost, participants' preference had a non linear order: $50, $0, $500 and $250. This could be a result of as the violation of the model assumptions or model misspecification. Most DCE analyses assume a linear utility function, but some recent studies have shown that this assumption may not be true for pricerelated attributes. A study of MPS players found that the utility function of the price and storage size had Wshaped curves rather than smooth linear trends [57]. A local travel mode study also found that the preference of time savings followed a nonlinear utility function [58]. Another reason which may cause inaccurate results in our study is the use of twostaged design. The twostaged design had the advantage of maximizing the information gained by forcing participants to make a choice at the first stage, but it also gave us some artificial information. Third, many respondents in this survey seemed to have predetermined their participation in CRC screening before seeing the questionnaire. This may have caused an unusually high within cluster correlation when choosing between participation and optout. We also doubt that the predetermination might cause the ordering effect [59] when choosing the preferred screening tests. When individuals are forced to make a choice between products which they have decided that they do not want, the answer might not resemble the truth. Therefore, the results need to be interpreted cautiouslyreplication from similar studies is needed to better understand participant preferences for CRC screening and the willingness to undertake the screening program.
Conclusion
Responses from the same participant are likely to be more similar than the responses between participants in DCE data leading to possible intraclass or intraparticipant correlation. Therefore, it is important to investigate the size of intraclass correlation before fitting any statistical model. We found that when withincluster correlation is very small, all models gave consistent results both on the estimates ranking and coefficients. Therefore, the simplest logistic regression and multinomial logistic regression are recommended for the computation advantage being ease. Multinomial probit model may be a preferred choice method of analysis if we assume the existence of the correlation between alternatives.
When withincluster correlation is high, sensitivity analyses are needed to examine the consistency of the results. Instead of making generalized inferences according to the estimate from any single statistical model, results from the sensitivity analyses based on different models can provide some insight about the robustness of the findings.
Our study empirically compared some commonly used statistical model on taking intraclass correlation into account when analyzing DCE data. To completely understand the necessity of accounting for the intraclass correlation for DCE data, particularly on analyzing nominal type of outcomes, simulation studies are needed.
Conflict of interest
The authors declare that they have no competing interests.
Authors' contributions
JC (chengj2@mcmaster.ca) conducted literature review, preformed the statistical analyses and composed the draft of the manuscript. LT (thabanl@mcmaster.ca) designed the original study, oversaw the statistical analysis and revised the manuscript. EP (pullena@mcmaster.ca) assisted planning statistical analyses and revised the manuscript. DAM (damarsha@ucalgary.ca) and JKM (marshllj@mcmaster.ca) designed the original study and revised the manuscript. All authors read and approved the final manuscript.
Acknowledgements
The original study was funded by a research grant from the Canadian Institutes for Health Research (MOB53116) and the Cancer Research Foundation of America. We thank the reviewers for their insightful comments and suggestions that led to improvements in the manuscript.
References

Longo MF, Cohen DR, Hood K, Edwards A, Robling M, Elwyn G, Russell IT: Involving patients in primary care consultations: assessing preferences using discrete choice experiments.
Br J Gen Pract 2006, 56(522):3542. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Ryan M, Major K, Skatun D: Using discrete choice experiments to go beyond clinical outcomes when evaluating clinical practice.
J Eval Clin Pract 2005, 11(4):328338. PubMed Abstract  Publisher Full Text

Lancaster KJ: A new approach to consumer theory.
J Polit Econ 1966, 74(2):132157. Publisher Full Text

Montgomery DC: Design and analysis of experiments. 5th edition. New York: Wiley; 2000.

Louviere J, Hensher D: On the design and analysis of simulated choice or allocation experiments in travel choice modelling.

Louviere J, Woodworth G: Design and analysis of simulated consumer choice or allocation experiments: an approach based on aggregate data.
J Mark Res 1983, 20:350367. Publisher Full Text

de BekkerGrob EW, Ryan M, Gerard K: Discrete choice experiments in health economics: a review of the literature.
Health Econ 2010.
doi:10.1002/hec.1697

Ryan M, Gerard K: Using discrete choice experiments to value health care programs: current practice and future research reflections.
Appl Health Econ Health Policy 2003, 2(1):5564. PubMed Abstract  Publisher Full Text

Marshall DA, Bridges JFP, Hauber B, Cameron RA, Donnalley L, Fyie KA, Johnson FR: Conjoint analysis applications in healthhow are studies being designed and reported? an update on current practice in the published literature between 2005 and 2008.
The Patient: PatientCentered Outcomes Research 2010, 3:249256. Publisher Full Text

Louviere JJ, Lancsar E: Choice experiments in health: the good, the bad, the ugly and toward a brighter future.
Health Econ Policy Law 2009, 4(Pt 4):527546. PubMed Abstract  Publisher Full Text

Bryan S, Dolan P: Discrete choice experiments in health economics. For better or for worse?
Eur J Health Econ 2004, 5(3):199202. PubMed Abstract  Publisher Full Text

Louviere JJ, Pihlens D, Carson R: Design of discrete choice experiments: a discussion of issues that matter in future applied research.

Ryan M, Gerard K, AmayaAmaya M: Using discrete choice experiments to value health and health care. Dordrecht, The Netherlands: Springer; 2008.

Bridges JFP, Hauber B, Marshall DA, Lloyd A, Prosser LA, Regier DA, Johnson FR, Mauskopf J: Conjoint analysis applications in healtha checklist: a report of the ISPOR good research practices for conjoint Analysis task force.
Value Health 2011, 14:403413. PubMed Abstract  Publisher Full Text

Lancsar E, Louviere J: Conducting discrete choice experiments to inform healthcare decision making: a user's guide.
PharmacoEconomics 2008, 26(8):661677. PubMed Abstract  Publisher Full Text

Mehndiratta SR, Hansen M: Analysis of discrete choice data with repeated observations: comparison of three Techniques in intercity travel Case.
Transp Res Rec 1997, 1607:69. Publisher Full Text

Marshall DA, Johnson FR, Phillips KA, Marshall JK, Thabane L, Kulin NA: Measuring patient preferences for colorectal cancer screening using a choiceformat survey.
Value Health 2007, 10(5):415430. PubMed Abstract  Publisher Full Text

Colorectal Cancer Association of Canada [http://www.colorectalcancer.ca/en/justthefacts/colorectal/ webcite] and [http://www.colorectalcancer.ca/en/screening/fobtandfit/ webcite]

Anonymous From the Centers for Disease Control and Prevention: Colorectal cancer test use among persons aged > or = 50 yearsUnited States, 2001.
JAMA 2003, 289(19):24922493. PubMed Abstract  Publisher Full Text

Walsh JM, Terdiman JP: Colorectal cancer screening: scientific review.
JAMA 2003, 289(10):12881296. PubMed Abstract  Publisher Full Text

Slomski A: Expert panel offers advice to improve screening rates for colorectal cancer.
JAMA 2010, 303(14):13561357. PubMed Abstract  Publisher Full Text

Labianca R, Merelli B: Screening and diagnosis for colorectal cancer: present and future.
Tumori 2010, 96(6):889901. PubMed Abstract

Kuhfeld WF: Discrete choice (SAS Technical Papers: Marketing research, MR2010F). [http://support.sas.com/techsup/technote/mr2010f.pdf] webcite
(Date of last access: January 7, 2012)

Street DJ, Burgess L: Optimal and nearoptimal pairs for the estimation of effects in 2level choice experiments.
Journal of Statistics Planning and Inference 2004, 118:185199. Publisher Full Text

Louviere J, Flynn F, Carson R: Discrete choice experiments are not conjoint analysis.

Neuhaus JM: Statistical methods for longitudinal and clustered designs with binary responses.
Stat Methods Med Res 1992, 1(3):249273. PubMed Abstract  Publisher Full Text

Pendergast JF, Gange SJ, Newton MA, Lindstrom MJ, Palta M, Fisher MR: A survey of methods for analyzing clustered binary response data.

Huber JS, Ervin LH: Using heteroscedastic consistent standard errors in the linear regression model.

Larsen K, Petersen JH, BudtzJørgensen E, Endahl L: Interpreting parameters in the logistic regression model with random effects.
Biometrics 2000, 56:909914. PubMed Abstract  Publisher Full Text

Hedeker D, Gibbons RD, Flay BR: Randomeffects regression models for clustered data with an example from smoking prevention research.
J Consult Clin Psychol 1994, 62(4):757765. PubMed Abstract  Publisher Full Text

Stata online help [http://www.stata.com/help.cgi?xtmelogit] webcite

RabeHesketh S, Skrondal A: Multilevel and longitudinal modeling using Stata. 2nd edition. USA: A Stata Press Publication; 2008.

Shults J, Sun W, Tu X, Kim H, Amsterdam J, Hilbe JM, TenHave T: A comparison of several approaches for choosing between working correlation structures in generalized estimating equation analysis of longitudinal binary data.
Stat Med 2009, 28(18):23382355. PubMed Abstract  Publisher Full Text

Hanley JA, Negassa A, Edwardes MD, Forrester JE: Statistical analysis of correlated data using generalized estimating equations: an orientation.
Am J Epidemiol 2003, 157(4):364375. PubMed Abstract  Publisher Full Text

Ballinger GA: Using generalized estimating equations for longitudinal data analysis.
Organ Res Methods 2004, 7:127150. Publisher Full Text

Long JS, Freese J: Regression models for categorical dependent variables using STATA. 2nd edition. Texas, USA: Stata Press; 2006.

Chib S, Greenberg E: Analysis of multivariate probit models.
Biometrika 1998, 85:347361. Publisher Full Text

Pizzo E, Pezzoli A, Stockbrugger R, Bracci E, Vagnoni E, Gullini S: Screenee perception and healthrelated quality of life in colorectal cancer screening: a review.
Value Health 2011, 14(1):152159. PubMed Abstract  Publisher Full Text

Cheng S, Long J: Testing for IIA in the multinomial logit model.
Sociological Methods Research 2007, 35(4):583600. Publisher Full Text

Crouchley R, Ganjali M: A comparison of GEE and random effects models for distinguishing heterogeneity, nonstationarity and state dependence in a collection of short binary event series.
Stat Model 2002, 2:3962. Publisher Full Text

Hedeker D: A mixedeffects multinomial logistic regression model.
Stat Med 2003, 22(9):14331446. PubMed Abstract  Publisher Full Text

Daganzo C: Multinomial probit: the theory and its application to demand forecasting. New York: Academic; 1979.

Kaplan D, Venezky RL: Literacy and voting behavior: a bivariate probit model with sample selection.
Soc Sci Res 1994, 23:350367. Publisher Full Text

Watt DJ, Kayis B, Willey K: The relative importance of tender evaluation and contractor selection criteria.
International Journal of Project Management 2010, 28:5160. Publisher Full Text

Heitman SJ, Hilsden RJ, Au F, Dowden S, Manns BJ: Colorectal cancer screening for averagerisk North Americans: an economic evaluation.
PLoS Med 2010, 7(11):e1000370. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Nelson RL, Schwartz A: A survey of individual preference for colorectal cancer screening technique.
BMC Cancer 2004, 4:76. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Campbell MJ, Donner A, Klar N: Developments in cluster randomized trials and Statistics in Medicine.
Stat Med 2007, 26(1):219. PubMed Abstract  Publisher Full Text

Ma J, Thabane L, Kaczorowski J, Chambers L, Dolovich L, Karwalajtys T, Levitt C: Comparison of Bayesian and classical methods in the analysis of cluster randomized controlled trials with a binary outcome: the Community Hypertension Assessment Trial (CHAT).
BMC Med Res Methodol 2009, 9:37. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Schukken YH, Grohn YT, McDermott B, McDermott JJ: Analysis of correlated discrete observations: background, examples and solutions.
Prev Vet Med 2003, 59(4):223240. PubMed Abstract  Publisher Full Text

Neuhaus JM, Kalbfleisch JD, Hauck WW: A comparison of clusterspecific and populationaveraged approaches for analyzing correlated binary data.
Int Stat Rev 1991, 59:2535. Publisher Full Text

Hubbard AE, Ahern J, Fleischer NL, Van der Laan M, Lippman SA, Jewell N, Bruckner T, Satariano WA: To GEE or not to GEE: comparing population average and mixed models for estimating the associations between neighborhood risk factors and health.
Epidemiology 2010, 21(4):467474. PubMed Abstract  Publisher Full Text

Munizaga MA, Heydecker BG, de Dios Ortúzar J: Representation of heteroskedasticity in discrete choice models.
Transp Res 2000, 34:219240. Publisher Full Text

Peters TJ, Richards SH, Bankhead CR, Ades AE, Sterne JA: Comparison of methods for analysing cluster randomized trials: an example involving a factorial design.
Int J Epidemiol 2003, 32(5):840846. PubMed Abstract  Publisher Full Text

Heil SF: A review of multilevel and longitudinal modeling using stata.
J Educ Behav Stat 2009, 34:559560. Publisher Full Text

Burda M, Harding M, Hausman J: A Bayesian mixed logitprobit model for multinomial choice.

Ferguson S, Olewnik A, Cormier P: Proceedings of the exploring marketing to engineering information mapping in mass customization: a presentation of ideas, challenges and resulting questions: August 2831; Washington, DC, USA. USA: ASME; 2011.

Kato H: Proceedings of the nonlinearity of utility function and value of travel time savings: empirical analysis of interregional nonbusiness travel mode choice of Japan: September 1820; Strasbourg. European Transport Conference: France; 2006.

Kjaer T, Bech M, GyrdHansen D, HartHansen K: Ordering effect and price sensitivity in discrete choice experiments: need we worry?
Health Econ 2006, 15(11):12171228. PubMed Abstract  Publisher Full Text
Prepublication history
The prepublication history for this paper can be accessed here: