Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
Harvard Radiation Oncology Program, Harvard Medical School, Boston, MA, USA
Abstract
Truth claims in the medical literature rely heavily on statistical significance testing. Unfortunately, most physicians misunderstand the underlying probabilistic logic of significance tests and consequently often misinterpret their results. This nearuniversal misunderstanding is highlighted by means of a simple quiz which we administered to 246 physicians at two major academic hospitals, on which the proportion of incorrect responses exceeded 90%. A solid understanding of the fundamental concepts of probability theory is becoming essential to the rational interpretation of medical information. This essay provides a technically sound review of these concepts that is accessible to a medical audience. We also briefly review the debate in the cognitive sciences regarding physicians' aptitude for probabilistic inference.
Background
Medicine is a science of uncertainty and an art of probability.  Sir William Osler
While probabilistic considerations have always been fundamental to medical reasoning, formal probabilistic arguments have only become ubiquitous in the medical literature in recent decades
Consider a typical medical research study, for example designed to test the efficacy of a drug, in which a null hypothesis
1.
2.
3.
4.
5. Both (1) and (2).
6. Both (3) and (4).
7. None of the above.
The answer profile for our participants is shown in Table
Quiz answer profile.
Answer
(1)
(2)
(3)
(4)
(5)
(6)
(7)
Number
8
0
58
37
6
69
12
Percent
4.2
0
30.5
19.5
3.2
36.3
6.3
Despite its central place in the theory of probabilistic inference, Bayes' rule has been largely displaced in the practice of quantitative medical reasoning (and indeed in the biological and social sciences generally) by a statistical procedure known as 'significance testing'. While significance testing can, when properly understood, be seen as an internally coherent aid to scientific data analysis
Discussion, Part I: probability in medicine
Reasoning under uncertainty
They say that Understanding ought to work by the rules of right reason. These rules are, or ought to be, contained in Logic; but the actual science of logic is conversant at present only with things either certain, impossible, or entirely doubtful, none of which (fortunately) we have to reason on. Therefore the true logic for this world is the calculus of Probabilities, which takes account of the magnitude of the probability which is, or ought to be, in a reasonable man's mind.  James Clerk Maxwell
The inadequacy of deductive logic
Since Aristotle the mainstream Western view has been that rationality means reasoning according to the rules of deductive logic
Or, similarly:
These logical forms play a role in straightforward medical diagnostic scenarios like the following:
• 75 year old man with fever, productive cough, chest xray showing consolidation of the right upper lobe, sputum culture positive for gram positive cocci in clusters.
Diagnosis: Pneumonia.
• 50 year old previously healthy man with sudden onset painful arthritis of the MTP joint of his right great toe, arthrocentesis positive for needleshaped, negatively birefringent crystals.
Diagnosis: Gout.
The reasoning required to make these diagnoses is essentially syllogistic, that is a matter of checking that the definitions of the disorders are satisfied, then drawing the inevitable conclusion.
However, medical reasoning frequently requires going beyond syllogistic reasoning. For example, consider the following argument type:
Of course, given the premise (
• 45 year old homeless alcoholic man brought in by police with confusion, disorderly behavior, and breath smelling of alcohol. Diagnosis: Ethanol intoxication.
• 75 year old nursing home resident with known heart failure presents with confusion and shortness of breath. Physical examination reveals rales, 3+ lower extremity pitting edema, labored breathing. Diagnosis: CHF exacerbation.
• 55 year old male presents to ED with acute onset substernal chest pain. Diagnosis: Gastric reflux.
Most physicians quickly assign rough degrees of plausibility to these diagnoses. However, in these cases it is reasonable to entertain alternative diagnoses, for example in the first case other intoxicants, or meningitis; and in the second case pulmonary embolus, pneumonia, or myocardial infarction. In the third case the stated diagnosis is only weakly plausible, and most physicians would doubt it at least until other possibilities (for example myocardial ischemia) are ruled out. In each case, there is insufficient information to make a certain (that is logically deductive) diagnosis; nevertheless, we are accustomed to making judgements of plausibility.
Stepping back once more, we can add to the list of argument types frequently needed in medical reasoning the following additional examples of even weaker 'weak syllogisms':
and
As in syllogistic reasoning, weak syllogistic reasoning combines prior knowledge (for example knowledge of medicine and clinical experience) with new data (for example from seeing patients, lab tests, or new literature), but the knowledge, data, and conclusions involved lack the certainty required for deductive logical reasoning. The practice of formulating differential diagnoses, and the fact that physicians do not routinely test for every possibility in the differential, shows that physicians do in fact routinely assign degrees of plausibility. The same can be said of most situations in everyday life, in which the ability to judge which possibilities to ignore, which to entertain, and how much plausibility to assign to each constitute 'common sense'. We now explore the rules that govern quantitative reasoning under uncertainty.
Cox's theorem and the laws of plausible reasoning
There is only one consistent model of common sense.  ET Jaynes
How might one go about making the 'weak syllogisms', introduced above, into precise quantitative statements? Let us attempt to replace the loose statement that '
From this it is apparent that what we are seeking is a formula that gives the strength of the conclusion as a function,
RT Cox (18981991)
where the numbers denoted by
• 0 ≤
•
•
•
•
where
Supplemental material.
Click here for file
In the rest of the paper, we will use the more common form for Bayes' rule, which is derived from the form given above by simple substitutions using the basic relations of probability just cited:
This form is useful in that it makes explicit the fact that Bayes' rule involves three distinct ingredients, namely
We pause before proceeding to comment on our focus in this essay on simple applications of Bayes' rule. Our aim is to explain the basic concepts governing probabilistic inference, a goal we believe is best served by using very simple applications of Bayes' rule to evaluating mutually exclusive truth claims (that is 'binary hypotheses'). We hasten to add that binary hypothesis comparison is not necessarily always the best approach. For instance, in the quiz beginning this essay, rather than pitting
Indeed, much realworld medical reasoning cannot be naturally reduced to evaluating simple 'true/false' judgements, but requires instead the simultaneous analysis of multiple data variables, which often take on multiple or a continuous range of values (not just binary). There are frequently not just two but many competing interpretations of medical data. Moreover, we are often more interested in inferring the magnitude of a quantity or strength of an effect rather than simply whether a statement is true or false. Similarly, evaluating medical research typically involves reasoning too rich to be naturally modeled as binary hypothesis testing (contrary to the spirit of Fisher's famous pronouncement that 'every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis'
Nevertheless, understanding the ongoing work at the frontiers of modern probability theory requires first a sound understanding of Bayes' rule in its most elementary form, the focus of this essay.
The 'subjective' interpretation of probability
It is important to appreciate that the interpretation of mathematical probability as a measure of plausibility, that is as a 'degree of belief', is not the only way of conceptualizing probability. Indeed, in mathematics probability theory is usually developed axiomatically, starting with the rules of probability as 'given'
The interpretation of probabilities as degrees of belief is often called the 'subjective interpretation of probability,' or more succinctly, 'Bayesian probability,' because Thomas Bayes is credited as the first to develop a coherent way to estimate probabilities of single events
The three ingredients of Bayes' rule
An intuition for why Bayes' rule has the form that it does can be gained by observing the effects produced by changing the values of each of its three variables. For concreteness, we frame our discussion in terms of the problem of distinguishing appendicitis from other causes of abdominal pain in a pediatric emergency department on the basis of the presence or absence of fever. In this example, fever is taken as evidence of appendicitis, so we have the following labels for the four possible combinations of fever (
Anatomy of Bayes' rule
The importance of each of the ingredients of Bayes' rule, the three arguments
1. Suppose that somehow we know, independent of fever status, that 100% of the patients have appendicitis,
2. Next, suppose every child with appendicitis has a fever,
3. To see that
4. Suppose that no one with appendicitis gets fevers,
These arguments show that the formula for the 'posterior probability', that is the probability of appendicitis given fever,
Physiology of Bayes' rule
We now explore how the output of Bayes' rule varies with its three inputs. Interactive online computer programs may also be helpful for gaining intuition, and can be found using the following references:
Consider a hypothetical population of 1,000 patients evaluated for abdominal pain in the pediatric emergency room, some with fever, some with appendicitis, some with both, and some with neither. We will systematically vary the proportions of each subpopulation and observe the output of Bayes' rule. The numbers used in these examples are summarized in Table
Hypothetical statistics for fever and appendicitis.




62

112
56%

11%
49

777
13%

36%
79

112
71%

11%
32

777
13%

41%
45

112
40%

11%
66

777
13%

29%
62

136
56%

11%
49

753
15%

31%
62

88
56%

11%
49

801
10%

41%
139

192
56%

45%
111

558
26%

42%
22

121
56%

4%
18

839
13%

6%
Initially, suppose that among our 1,000 patients, 121 are ultimately found to have appendicitis. Fever was present on initial presentation in 174 patients, of which 62 are found to have appendicitis. The number of true positives, false positives, false negatives, and true negatives calculated from these numbers are listed in the first row of Table
the false positive rate (also known as 1specificity) as:
and the prior probability (also known as prevalence) as:
This situation is shown schematically in Figure
Reference population of patients with appendicitis and fever, showing the result of conditioning on the presence of fever
Reference population of patients with appendicitis and fever, showing the result of conditioning on the presence of fever.
So, in a febrile child complaining of abdominal pain, what is the probability of appendicitis? Based on the information above, most physicians give an answer close to 56%, a conclusion reached apparently by mentally replacing the prior probability
>Varying
Suppose we increase the true positive rate
Effects on posterior probability of changes in sensitivity, while holding prior probability and false positive rate constant
Effects on posterior probability of changes in sensitivity, while holding prior probability and false positive rate constant.
Varying
Next let us slightly increase the false positive rate
Effects on posterior probability of changes in false positive rate, while holding prior probability and sensitivity constant
Effects on posterior probability of changes in false positive rate, while holding prior probability and sensitivity constant.
Conversely, a decrease in the false positive rate
Varying
Finally, consider increasing the prior probability of appendicitis
Effects on posterior probability of changes in prior probability, while holding sensitivity and false positive rate constant
Effects on posterior probability of changes in prior probability, while holding sensitivity and false positive rate constant.
Summary of the general rules
These examples illustrate the following general principles (assuming a 'positive' test result):
• Increasing the true positive rate (sensitivity) pushes the posterior probability upward, whereas decreasing the true positive rate pushes the posterior probability downward.
• Increasing the false positive rate (1specificity) pushes the posterior probability downward, whereas decreasing the false positive rate pushes the posterior probability upward.
• Increasing the prior probability pushes the posterior probability upward, whereas decreasing the prior probability pushes the posterior probability downward.
We emphasize again that in every case the posterior probability goes up or down from the prior probability, rather than being replaced by any of the three quantities. These general rules are illustrated in the graphs in Figure
Illustration of how the posterior probability depends on the three parameters of Bayes' rule
Illustration of how the posterior probability depends on the three parameters of Bayes' rule. Each plot shows two curves for the posterior probability as a function of one of the three parameters (with the remaining two parameters held constant) chosen from among one of two sets of values for(
End of Part I
Uncertainty suffuses every aspect of the practice of medicine, hence any adequate model of medical reasoning, normative or descriptive, must extend beyond deductive logic. As was believed for many decades, and recently proven by Cox and Jaynes, the proper extension of logic is in fact probability theory, with Bayes' rule as the central rule of inference. We have attempted to explain in an accessible way why Bayes' rule has its particular form, and how its behaves when its parameters vary. In the Part II, we investigate ways in which probability theory is commonly misunderstood and abused in medical reasoning, especially in interpreting the results of medical research.
Discussion, Part II: significance testing
Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis.  RA Fisher
Armed with our understanding of the anatomy and physiology of Bayes' rule, we are prepared for pathophysiology. In Part II we explore common misinterpretations and misuses of elementary medical statistics that occur in the application of significance testing, and how these can be effectively treated by applying our understanding of Bayes' rule.
Before one can appreciate the problems with significance testing, one needs a clear understanding of a few concepts from 'classical statistics', namely binary hypothesis testing and
Binary hypothesis testing
Binary hypothesis testing is familiar to most physicians as the central concept involved in judging the results of clinical trials. The basic setup was encountered in the quiz that began the paper. For any proposition
Our conclusions can be right or wrong in four ways (see Table
The analogy between diagnostic tests and clinical trials.
Diagnostic testing
Clinical trials
Absence of disease
Truth of null hypothesis
Presence of disease
Falsity of null hypothesis
Cutoff between positive and negative results
Significance level, α
Test result
Negative result
Positive result
Sensitivity
Power
False positive rate (1specificity)
Significance level, α
Prior probability of disease
Prior probability of a difference between groups
Posterior probability of disease, given test result
Posterior probability of a difference between groups, given study results
The null hypothesis significance testing procedure
Let us now consider the conventional statistical reasoning process followed in drawing conclusions about experiments. This reasoning is prescribed by a standardized statistical procedure, the 'null hypothesis significance testing procedure' (NHSTP), or simply 'significance testing', consisting of the following steps.
1. Specify mutually exclusive and jointly exhaustive hypotheses
2. Design an experiment to obtain data
3. Choose a minimum acceptable level of Type I error, called the 'significance level', denoted α
4. Do the experiment, yielding data
5. Compute the
6. Compare the
In the customary statistical jargon, when
We now review what
The probability that the observed result could have been produced by chance alone
This definition is vague, and tempts many users into confusing the probability of the hypothesis given the data with the probability of the data given the hypothesis
the probability that the data (that is the value of the summary statistic for the data), or more extreme results, could have occurred if the intended experiment was replicated many, many times, assuming the null hypothesis is true.
The potential morass created by this definition can be illustrated by imagining that an experimenter submits a set of data, consisting, say, of 23 data samples, to a statistical computer program, which automatically computes a
In what follows, we will avoid the 'constructional' objections raised above by using a mathematically explicit definition for the
the probability under the null hypothesis of obtaining the same or even less likely data than that which is actually observed, that is the probability of obtaining values of the test statistic that are equal to or more extreme than the value of the statistic actually computed from the data, assuming that the null hypothesis is true.
Note that this definition does not include any reference to the 'intentions' under which the data were collected. To avoid any possible confusion, we emphasize that this definition requires that the null hypothesis,
We now turn to explaining our final, technical definition of the
Angle 1. Pvalues as tail area(s)
Graphically, a
Distribution of systolic blood pressures for a population of healthy 6069 year old males (from data in
Distribution of systolic blood pressures for a population of healthy 6069 year old males (from data in
If instead the null hypothesis states that the patient is chronically normotensive,
Angle 2. Pvalues for coin flipping experiments
Let us carry out the
1. Let
2. The experiment will consist of flipping a coin a number of times
3. We set the significance level to the conventional value
4. Having done the experiment suppose we get data
5. To calculate the
(See Additional file
6. Since
Before leaving this example, it is instructive to examine its associated Type I and II error rates. The Type I error rate (false positive rate) in this case is the probability of incorrectly declaring the coin unfair (
Then:
Thus, we see that
Calculation of the false negative rate requires additional assumptions, because a coin can be biased in many (in fact, infinitely many) ways. Perhaps the least committed alternative hypothesis
Angle 3: Pvalues from ROC curves
To take a third angle, we consider an alternative definition for the
the minimum false positive rate (Type I error rate) at which the NHSTP will reject the null hypothesis.
Though not obvious at first glance, this definition is mathematically equivalent to our previous definition of the
Let us step back and consider the null hypothesis testing procedure from an abstract point of view. The NHSTP is one instance of thresholddecision procedure, that is, a procedure that chooses between two alternatives by comparing a test statistic computed from the data
The resulting ROC curve
ROC curve for the coin flipping experiment with
ROC curve for the coin flipping experiment with
Key points on the ROC curve are marked by circles, and the corresponding value for is
Now for the point of this whole exercise: If we drop a vertical line from the point on the ROC curve
Is significance testing rational?
The null hypothesis significance test (NHST) should not even exist, much less thrive as the dominant method for presenting statistical evidence. . . It is intellectually bankrupt and deeply flawed on logical and practical grounds.  Jeff Gill
We are now in a position to answer the question: Is the null hypothesis significance testing procedure a rational method of inference? We will show momentarily that the answer is a resounding 'NO!', but first we briefly consider why, despite its faults, many find it intuitively plausible. Several books explore the reasons in detail
This argument is called 'proof by contradiction':
By analogy, this argument could be called 'probabilistic proof by contradiction'. However, this analogy quickly dissolves after a little reflection: The premise (that is the 'if, then' statement) leaves open the possibility that
Again, we have just seen that this is an invalid argument. One obvious 'fix' is to try softening the argument by making the conclusion probabilistic:
Unfortunately, any apparent validity this has is still an illusion. To see the problem with this argument, let us return to the mammography example. Is it rational to conclude that a positive mammogram implies that a woman probably has breast cancer? The correct answer, obvious to most physicians at an intuitive if not at a formal statistical level is, 'it depends on the patient's clinical characteristics, and on the quality of the test'. Very well, then let us give a bit more information: Suppose that mammography has a false positive rate of 20%, and sensitivity of 80%. Can we now assign a probable diagnosis of breast cancer? Interestingly, most physicians answer this question affirmatively, giving a probability of cancer of 80%, a conclusion apparently reached by erroneously replacing the sensitivity
It is like the experiment in which you ask a secondgrader: 'If eighteen people get on a bus, and then seven more people get on the bus, how old is the bus driver?' Many secondgraders will respond: 'Twentyfive.'....Similarly, to find the probability that a woman with a positive mammography has breast cancer, it makes no sense whatsoever to replace the original probability that the woman has cancer with the probability that a woman with breast cancer gets a positive mammography.  Eliezer Yudkowsky
To calculate the desired probability
To put it as alarmingly as possible, the probability that she has breast cancer has increased by almost 8 fold! Nevertheless, she probably does not have cancer (7.8% is far short of 50%); the odds are better than nine to one against it, despite the positive mammogram. Thus, while further testing may be in order, a rational response is reassurance and perhaps further investigation rather than pronouncement of a cancer diagnosis. This and other examples familiar from everyday clinical experience make clear that the null hypothesis significance testing procedure cannot 'substitute' for Bayes' rule as a method of rational inference.
We have focused our criticism on what we consider to be the most fundamental and most common error in the interpretation of
Answers to the quiz
The answer to the quiz at the beginning of this paper is plain from the preceding discussion. Given a
To determine the probability that
Do prior probabilities exist in science?
Though most physicians are comfortable with the concept of prior probability in the context of diagnostic test interpretation, many are less comfortable thinking about prior probabilities in the context of interpreting medical research data. As one respondent to our quiz thoughtfully objected,
The big difference between a study and a clinical test is that there is no real way of knowing how likely or unlikely a hypothesis is a priori. In order to have a predictive value in a clinical test, you need a prevalence or pretest probability. This does not exist in science. It is the job of the scientist to convince us that the pretest probability is reasonably high so that a result will be accepted. They do this by laying the scientific groundwork (introduction), laying out careful methods, particularly avoiding bias and confounders (methods), and describing the results carefully. Thereafter, they use the discussion section to outright and unabashedly try to convince us their results are right. But in the end, we do the positive predictive value calculation in our head as we read a paper... As an example, one person reads the SPARCL study and says, 'I do not CARE that the
This response actually makes our point, perhaps inadvertently, about the necessity of prior probabilities. Nevertheless, several important points raised by this response warrant comment.
Do prior probabilities 'exist' in science?
First, to the philosophical question of whether prior probabilities 'exist' in science, the answer is 'yes and no'. On the one hand, probability theory is always used as a simplifying model rather than a literal description of reality, whether in science or clinical testing (with the possible exception of probabilities in quantum mechanics). Thus, when one speaks of the probability that a coin flip will result in heads, that a drug will have the intended effect, or that a scientific theory is correct, one is not necessarily committing to the view that nature is truly random. In these cases, the underlying reality may be deterministic (for example a theory is either true or false), in which referring to probabilities represents merely a convenient simplification, but do not really 'exist' in the sense that they would not be needed in a detailed, fundamental description of reality. However, simplification is essentially always necessary in dealing with any sufficiently complex phenomena. For example, while it might be possible to conceive of a supercomputer capable of predicting the effects of a drug using detailed modeling of the molecular interactions between the drug and the astronomical number of cells and molecules in an individual patient's body, in practice we must make predictions with much less complete information, hence we use probabilities. The use of such simplifications is no less important in scientific thinking than in medical diagnostic testing. Thus, insofar as probabilities 'exist' at all, they are not limited to the arena of diagnostic testing.
Are prior probabilities in science arbitrary?
Given that prior probabilities for hypotheses in science and medicine are often difficult to specify explicitly in precise numerical terms, does this mean that any prior probability for a hypothesis is as good as any other? There are at least two reasons that this is not the case. First, pragmatically, people do not treat prior probabilities regarding scientific or medical hypotheses as arbitrary. To the contrary, they go to great lengths to bring their probabilities into line with existing evidence, usually by integrating multiple information sources, including direct empirical experience, relevant theory (for example an understanding of physiology), and literature concerning prior work on the hypothesis or related hypotheses. These prior probability assignments help scientists and physicians choose which hypotheses deserve further investment of time and resources. Moreover, while these probability estimates are individualized, this does not imply that each person's 'subjective' estimate is equally valid. Generally, experts with greater knowledge and judgement can be expected to arrive at more intelligent prior probability assignments, that is their assignments can be expected to more closely approximate the probability an 'ideal observer' would arrive at based on optimally processing all of the existing evidence. Second, in a more technical vein, methods for estimating accurate prior probabilities from existing data are an active topic of research, and are likely to lead to increased and more explicit use of 'Bayesian statistics' in the medical literature
Taking responsibility for prior probabilities
Finally, regarding the responsibility of scientific authors and readers to take prior probabilities seriously: We emphatically agree that authors should strive to place their results in context, so as to give the firmest idea possible of how much plausibility one should afford a hypothesis, prior to seeing the new data being presented. Without this context, there is no way to appraise how likely a hypothesis is to actually be true, or how strong the evidence needs to be to be truly persuasive. The neglect of thorough introductory and discussion sections in scientific papers is decried by many as a natural side effect of reliance on significance testing arguments
Has significance testing been perverted?
Considering the criticisms we have reviewed, it is natural to ask whether significance testing is being used as its originators intended. Significance testing is actually an amalgam of two approaches to statistical inference, developed on the one hand by RA Fisher, who invented the concept of
...no test based upon a theory of probability can by itself provide any valuable evidence of the truth or falsehood of a hypothesis. . . But we may look at the purpose of tests from another viewpoint. Without hoping to know whether each separate hypothesis is true or false, we may search for rules to govern our behavior with regard to them, in following which we insure that, in the long run of experience, we shall not often be wrong
Thus, Neyman and Pearson apparently did not intend hypothesis testing to be used as it usually is used nowadays, as a method for appraising the truth of individual hypotheses. Rather, their method was intended merely to be correct in an aggregate sense. While this may be acceptable, say, to decide the fates of massproduced objects in an industrial setting, it is unsatisfactory in medical situations involving individuals. There, it is imperative that we strive to be right in each case. Similarly, few researchers would be content to use a method of inference realizing that it cannot accurately appraise the truth of the individual hypotheses. While significance testing does not provide a way to know 'whether each separate hypothesis is true or false', fortunately Bayes' rule does provide rational grounds for appraising the strength of evidence in favor of individual hypotheses.
How significant is a significant result?
If it is unjustified to regard a 'statistically significant' result as sufficient evidence for the truth of a hypothesis, then what can we conclude when we read '
Bayesian
Bayesian
Are physicians good Bayesians?
Probability theory was regarded by its early architects as a model not only for how educated minds should work, but for how they do actually work. This 'probabilistic theory of mind' forms the basis for modern views on the nature of rationality in philosophy, economics, and more recently in neuroscience
AntiBayes
In his evaluation of the evidence, man is apparently not a conservative Bayesian: he is not a Bayesian at all.  Kahneman and Tversky
The most serious challenge to the probabilistic theory of mind is the 'heuristics and biases' movement of experimental psychology, started by a series of influential papers published in the late 1960 s and early 1970 s by Kahneman and Tversky
Prior (pretest) probabilities
Physicians' estimates of the prior probability of disease may vary wildly
Representativeness bias
This is the tendency to violate the old medical maxim, 'when you hear hoofbeats, think horses, not zebras.' That is, the tendency to set the prior probability inappropriately high for rare diseases whose typical clinical presentation matches the case at hand, and inappropriately low for common diseases for which the presentation is atypical. This bias leads to overdiagnosis of rare diseases.
Availability bias
Also called the 'last case bias' in the medical context, this is the tendency to overestimate the probability of diagnoses that easily come to mind, as when, having recently seen a case of Hashimoto's encephalopathy, one automatically suspects this first in the next patient who presents with confusion, a relatively nonspecific sign. Another example is doubting that smoking is harmful because one's grandmother was a smoker yet lived to age ninety.
Posterior (posttest) probabilities
Other studies have explored ways in which physicians deviate from Bayes' rule in updating prior probabilities in light of new data
Anchoring bias
This is the tendency to set one's posterior probability estimate inappropriately close to a starting value, called an anchor. Errors can arise from anchoring to an irrelevant piece of information (as when patients are sent home from the lowacuity part of the emergency department who would have been admitted from the highacuity part), or by generally undervaluing new information when it does not support one's initial impression.
Confirmation bias
Also known as belief preservation, hypothesis locking, and selective thinking, this is the tendency maintain one's favored hypothesis by overvaluing and selectively searching for confirmatory evidence and undervaluing or ignoring contradictory evidence. Reasons for this bias include vested emotional interest, for example as when avoiding a potentially upsetting diagnosis, or inconvenience, for example as when downplaying medical symptoms in a patient with challenging psychiatric problems.
Premature closure bias
This is the tendency to make a diagnosis before sufficient evidence is available. Premature closure bias can arise from emotional factors such as discomfort over a patient's or the physician's own uncertainty, or because of time pressure
ProBayes
[T]he theory of probability is at bottom nothing more than good sense reduced to a calculus which evaluates that which good minds know by a sort of instinct, without being able to explain how with precision.  Laplace
The heuristics and biases movement notwithstanding, the probabilistic theory of cognition has been resurrected in recent years in the fields of neuroscience, artificial intelligence, and human cognitive science. As mentioned earlier, Bayesian theories have provided successful explanations of the sub or preconscious mental phenomena, such as learning
There is also a growing consensus that many higherlevel human cognitive processes also operate on Bayesian principles
Instinctual Bayesianism?
How can the view that in many situations people perform Bayesian inference be reconciled with findings from the Heuristics and Biases movement (and our quiz results), showing that most people understand the elementary concepts of probability and statistics poorly at best? In large part, the answer is that fluency with statistics and probability theory at a formal level need not cast doubt on Laplace's claim that 'good minds' use probability theory by 'a sort of instinct'. Thus, although physicians are vulnerable to the traps of experimental psychologists in tests of formal verbal reasoning about probability and statistics, nevertheless physicians are adept at managing uncertainty. We suspect that studies similar to that of Tenenbaum
Summary
Until recently, the art of medical reasoning has arguably gotten along well enough with little formal understanding of mathematical probability. This has been possible largely because, as Laplace observed, at some informal, implicit level, the everyday reasoning of good minds conforms to the laws of probability. However, physicians can no longer afford the luxury of complete informality. Without a solid understanding of basic probability, one can no longer intelligently interpret the medical literature. The quiz results that began this essay are a sobering reminder that most physicians still lack understanding of elementary probability and statistics. In particular, it is worrisome that physicians seem to so easily fall prey to the illusion that significance testing allows one to evaluate the truth of a hypothesis without having to take into account contextual information like prior studies and biological plausibility.
Like others we are concerned that the increasing use of statistics without a parallel increase in statistical literacy renders the medical literature at risk for becoming less scientific
Abbreviations
AMI: acute myocardial infarction; NHST: null hypothesis significance test; NHSTP: null hypothesis significance testing procedure.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
A zero difference between the three authors' contributions to this work is among the credible values.
Acknowledgements
Thanks to the medical residents and faculty members at Brigham and Women's Hospital, Massachusetts General Hospital, and Barnes Jewish Hospital who participated in the quiz. The authors also gratefully acknowledge Emily J Westover, PhD, and Sydney Cash, MD, PhD, and the reviewers for critical comments on earlier versions of the manuscript.
Prepublication history
The prepublication history for this paper can be accessed here: