Vignette studies of medical choice and judgement to study caregivers' medical decision behaviour: systematic review1 Horten Centre, University of Zurich, Bolleystrasse 40, CH-8091 Zurich, Switzerland 2 Division of Epidemiology and Biostatistics, Department of Social and Preventive Medicine, University of Bern, Switzerland 3 Department of General Practice, Academic Medical Center, Amsterdam, The Netherlands 4 Department of Clinical Epidemiology and Medical Technology Assessment, Maastricht University Hospital, Maastricht, The Netherlands
BMC Medical Research Methodology 2008, 8:50doi:10.1186/1471-2288-8-50 The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2288/8/50
©
2008 Bachmann et al; licensee BioMed Central Ltd. AbstractBackgroundVignette studies of medical choice and judgement have gained popularity in the medical literature. Originally developed in mathematical psychology they can be used to evaluate physicians' behaviour in the setting of diagnostic testing or treatment decisions. We provide an overview of the use, objectives and methodology of these studies in the medical field. MethodsSystematic review. We searched in electronic databases; reference lists of included studies. We included studies that examined medical decisions of physicians, nurses or medical students using cue weightings from answers to structured vignettes. Two reviewers scrutinized abstracts and examined full text copies of potentially eligible studies. The aim of the included studies, the type of clinical decision, the number of participants, some technical aspects, and the type of statistical analysis were extracted in duplicate and discrepancies were resolved by consensus. Results30 reports published between 1983 and 2005 fulfilled the inclusion criteria. 22 studies (73%) reported on treatment decisions and 27 (90%) explored the variation of decisions among experts. Nine studies (30%) described differences in decisions between groups of caregivers and ten studies (33%) described the decision behaviour of only one group. Only six studies (20%) compared decision behaviour against an empirical reference of a correct decision. The median number of considered attributes was 6.5 (IQR 4–9), the median number of vignettes was 27 (IQR 16–40). In 17 studies, decision makers had to rate the relative importance of a given vignette; in six studies they had to assign a probability to each vignette. Only ten studies (33%) applied a statistical procedure to account for correlated data. ConclusionVarious studies of medical choice and judgement have been performed to depict weightings of the value of clinical information from answers to structured vignettes of care givers. We found that the design and analysis methods used in current applications vary considerably and could be improved in a large number of cases. BackgroundPreferences and perceived similarities or differences between choice alternatives can be evaluated using structured vignettes. There are two prominent methods of constructing such models of medical judgements, each with their own literature and set of advocates. These are conjoint analysis, developed in the 1970s to study preference and choice [1], and judgement analysis, also called social judgement theory, developed in the 1950s from Brunswik's lens model [2,3]. The two have developed along very different theoretical lines and have developed somewhat different methodology, although there is considerable overlap. Today there is a large number of marketing applications, where the joint effects of multiple product attributes on product choice have been studied. The types of choices include 'ranking', 'rating', and 'discrete choice'. These methods can be carried forward to the analysis of medical decision making, as medical decisions require judgement under uncertainty. This uncertainty may concern a state, such as the presence of illness, the likelihood of future events, such as those in the natural course of an illness, or the likelihood with which such events may be averted, that is, treatment effects. For many years decision-making research has explored physicians' estimation of probabilities given clinical scenarios [4]. However, there have been concerns whether physicians' probability setting leads to consistent ratings [5]. Moreover, cognitive psychological research shows that physicians do not apply probabilities as suggested by decision-making theory but use their own heuristics to decide [6-8]. Studies of medical choice and judgement offer a way to elicit the public's, patients' and caregivers' views on healthcare that circumvents probability statements [9-11]. The technique is gaining widespread use in healthcare and has been applied in different areas for example to establish patients' preferences in the doctor-patient relationship [12], or to determine optimal treatments for patients [13]. Increasingly, discrete choice analyses are being employed to study how physicians weigh clinical information in the diagnostic work-up. In particular, respondents are asked to rank, rate, or choose between simulated clinical cases varying in values of different symptoms along the possibility that this case will have a certain illness or will need a certain treatment. Comparison with the results of clinical studies allows an analysis of potential discrepancies (e.g. undervaluation of signs and symptoms, overvaluation of test results). Moreover, such comparisons with reference data from clinical studies allow linking physicians' behaviour to illness probabilities and therefore allow examining (implicit) decision thresholds. A considerable number of studies have been published recently. We provide an overview of existing reports, present an inventory of their objectives and methods, and evaluate them using systematic review methodology. MethodsWe defined a study of medical choice and judgement as an investigation in which preferences were elicited in physicians, nurse practitioners or medical students and that allowed the estimation of the relative importance of different characteristics. Search strategyWe performed electronic searches in Medline, PsychINFO, CINAHL (Ovid®-version). Web of Science (ISI web of Science®) was used to locate studies that cited four key papers [14-17]. The last update search was performed on 25/3/2005. The exact search strategy may be obtained from the authors. Inclusion criteriaEligible articles for this review had to infer cue or attribute weighting from answers to structured vignettes and had to report on caregivers' decision making. Data extraction strategyWe developed a data extraction form based on the assessment of three articles [17-19]. The form contained twelve items describing a study's salient features of context, design and analysis (for details see Table 1). Table 1. Salient features of studies included in the systematic review. Besides some study descriptors such as first author and year of publication, we extracted information on the studies' objectives, the clinical problem, who the decision-maker was, the type of decision/preference (diagnosis, treatment, risk, prognosis, diagnosis & treatment, and other), the number of participants and the authors' aims. The objectives were extracted into five categories: description of preferences in one group of caregivers (1), comparison of two or more groups such as different professions or different levels of competence. (2), assessment of the consistency within caregivers with their actual decisions or their direct rating of the attributes (3), assessment of changes in preferences over time, e.g. after attending a course (4), and comparison of caregivers with guidelines (5a), actual patients' preferences (5b), or the findings of one or more clinical studies (5c). We also registered the number of vignettes, the number of attributes of each vignette and the rationale behind the selection of the attributes. Finally, we documented how participants were asked to respond to the vignettes: rating (yes/no, otherwise), ranking, probability estimates, or discrete choice and the way, if any, in which authors accounted for correlated data in the analysis. We extracted this item because observations resulting from these experiments are typically not independent. Each respondent evaluates each of the vignettes. This makes the data from one respondent more alike than one would expect under the assumption of independence, and therefore standard deviations of the attributes could be underestimated. We searched for any statistical method that allows to adjust the standard errors for the intra-group correlation. All studies were assessed in duplicate. Discordant scores based on reading errors were corrected. Discordant scores based on real differences in interpretation were discussed and resolved through consensus. ResultsThe searches retrieved 2001 records. Full papers of 81 potentially relevant studies were obtained. In total 51 articles did not meet the inclusion criteria and were excluded after reading the full reports, leaving 30 reports published between 1983 and 2005 for evaluation. (See flowchart in the Figure 1) The salient features of included studies are shown in the Table 1.
General aspectsAlthough the first study was published in 1983, 24 studies (84%) were published after 1995. Twenty-seven out of thirty studies examined decision behaviour of medical experts [15,17-42]. In half of the studies more than one type of respondent was surveyed. Twenty-eight different medical problems were addressed. Twenty-two (73%) studies examined treatment decisions. Eleven studies (37%) asked for a preferred diagnostic decision, sometimes (6 studies) in combination with a treatment decision. ObjectivesTen studies (33 percent) aimed at describing decision preferences of specific groups of participants [20,24,26,27,29,30,32,38,43,44] and nine studies (30 percent) described decision preference differences between groups [18,25,28,31,34,36,37,40,41]. Three studies explored the consistency of decisions between groups of experts [15,23,45] and two studies examined change of preferences after an intervention [22,35]. Only six studies (20 percent) compared decision behaviour against some sort of empirical reference such as a guideline [21,39,42] (n = 3), actual patient data [17,19] (n = 2) or the result of a clinical study [33]. DesignThe median number of attributes was 6.5 (inter quartile range IQR 4–9, range 2–15). In 20 studies (67%) the selection of attributes was based on information like the literature [17,20,27,28,31,32,34,35,37,40,43,45] (12 studies), expert opinion (7 studies) or guidelines (1 study). In five studies patient files [15,21,24,33,41] were used to construct the vignettes. The median number of vignettes was 25 (IQR 16–32), ranging from 3 to 130. Authors used several response modes for the vignettes. In eight cases they used more than one response mode. In 23 cases authors used a rating procedure [15,18,20-25,27-31,34-37,40-45], where respondents had to rate the relative importance of a given vignette or assign a probability (n = 6) to a diagnosis or outcome [31-33,35-37]. One study used a ranking design, where respondents had to arrange each of the attributes in descending order of importance [24]. In six studies respondents could reply with a yes/no choice [27,30,31,35,38,39]. One study used a conventional discrete choice mode, where respondents, given two or more vignettes, had to select one with the highest likelihood of postoperative recovery [26]. AnalysisTwenty (67%) studies did not correct for correlated data. Consequently, only ten studies applied some statistical procedure to account for this correlation within the data [15,22,26,32-37,41]. DiscussionThis review has two main findings. First, studies of medical choice and judgement are regularly used in the medical field to explore healthcare providers' decision behaviour or preferences. Second, we found a broad spectrum of different methods, and both design and analysis were suboptimal in some cases. Cognitive burden/complexityOne fourth of our studies either contained vignettes with more than nine attributes or compiled sets of over forty vignettes in the same experiment. Empirical evidence showing that these figures are too high is scarce and there is much controversy particularly about the number of vignettes [46]. From a cognitive psychological point of view both figures appear to be very high and could bias the results. This bias typically occurs because respondents are unable to integrate and process large information quantities provided simultaneously, or because respondents lose attention when sifting through too many vignettes. However, evidence suggests that more attributes, more choice options and more vignettes decrease response reliability, but do not bias mean responses [46]. As a rule of thumb, the number of attributes per vignette should not exceed six to eight [47-49]. There is much opinion and controversy about maximally allowed number of vignettes, but little rigorous evidence [46]. A re-analysis of 21 commercial studies suggests a maximum of 20 vignettes [48] and a review of discrete choice experiments evaluating healthcare shows that the number of vignettes seldom exceeds 16 [49]. Furthermore, the majority of studies either used a ranking or rating response mode. These two modes imply very strong assumptions about human cognitive abilities making it more likely that measures will be biased and invalid [50]. Consequently, we therefore recommend the choice based approach. Validity, usefulness of study objectivesIn contrast to applications in marketing research where the main topic of a study is to identify opinions regarding a new product, we would be particularly interested to learn about the correctness of care givers' weighting of the value of clinical information in decisions. While there is no normative benchmark for a "correct" product there is usually one in medical judgement if clinical studies are available. For example, if the results of a study on medical choice and judgement showed that physicians consistently attribute high weights to relatively uninformative lab test but instead undervalue the informativeness of cues from clinical examination they would hint at something that needed to be improved perhaps with an educational intervention. Also the method would allow assessing the change in preferences after intervening with educational measures. Most studies did not compare the attributed weights to some sort of normative benchmark such as the results of a clinical study. We only found one out of 30 studies that actually examined this and another five that used a further normative reference (guidelines or patient files). In absence of a normative benchmark these studies leave it to the reader to approve or disapprove the results. Moreover, assessment of discrepancies between different groups of participants has the problem that these could be explained by different clinical circumstances or other factors rather than group specific differences. On the other hand there are medical situations in which views about optimal choices are controversial. In these situations studies that do not compare caregivers' decision behaviour (or preferences) to some norm may still be useful in that they allow the examination of present opinions. Statistical modelThe majority of studies did not account for correlated data in the analysis. Correlated data occur because each respondent assesses different vignettes. Not accounting for this leads to too small estimates of the standard deviations for an attribute and can mimic a statistically significant association where in fact there is none. Unfortunately, guidelines on the conduct of conjoint analyses have not yet reached consensus about the optimal way to analyse correlated data. LimitationsWhat are the limitations of this review? We think that the search and appraisal procedures were reliable. However, sometimes classifications were difficult to make because of unclear descriptions in the article. We did not contact authors to clarify these uncertainties. Second, there have been two prominent methods of constructing linear models of medical judgements, each with their own literature and set of advocates. These are conjoint analysis, developed in the 1970s to study preference and choice[1], and judgement analysis, also called social judgement theory, developed in the 1950s from Brunswik's lens model[2,3]. In this review we did not make a distinction between the two methods because there is substantial overlap in methodology. Arguably this is a weakness of our study. However, since we were interested in providing an overview of all studies that examined medical decisions of care givers using cue weightings from answers to structured vignettes applying all sorts of different methods, we feel that our approach has its own merit. Future researchOur review indicates that current applications of conjoint and judgment analysis in the medical field remain suboptimal in some instances. We think that researchers should consider our propositions to ensure internal validity. Moreover we believe that studies investigating care givers' judgements are most valuable if they allow comparisons with some norm and if they include an assessment of deviations from that norm. Our review only found few such investigations. From a more methodological point of view we agree with a statement in a recent editorial that research is required to learn whether individuals do behave in reality as they state in a hypothetical context. [51] ConclusionWe believe that studies of medical choice and judgement offer many attractive and new insights into medical action. Provided that both methods and application evolve they offer a unique opportunity to improve quality of care. Competing interestsThe authors declare that they have no competing interests. Authors' contributionsAGHK conceived of the study and LMB obtained funding. AGHK and LMB designed the study, supervised the work and drafted the manuscript. AM and AB carried out the data extraction. AM, AB, UH and GtR participated in the design of the study and gave important conceptual input. All authors read and approved the final manuscript. AcknowledgementsWe thank Dr. Pius Estermann (information specialist, University Hospital Zurich) for doing the literature searches. This work was supported by the Swiss National Science Foundation (grants no. 3233B0-103182 and 3200B0-103183). The funding body played no role in study design; in the collection, analysis, and interpretation of data; in the writing of the manuscript; and in the decision to submit the manuscript for publication. References
Pre-publication historyThe pre-publication history for this paper can be accessed here: http://www.biomedcentral.com/1471-2288/8/50/prepub Have something to say? Post a comment on this article! |




on Google Scholar







author email
corresponding author email
Figure 1.