Abstract
Background
As an increasingly large number of metaanalyses are published, quantitative methods are needed to help clinicians and systematic review teams determine when metaanalyses are not up to date.
Methods
We propose new methods for determining when nonsignificant metaanalytic results might be overturned, based on a prediction of the number of participants required in new studies. To guide decision making, we introduce the "new participant ratio", the ratio of the actual number of participants in new studies to the predicted number required to obtain statistical significance. A simulation study was conducted to study the performance of our methods and a real metaanalysis provides further evidence.
Results
In our three simulation configurations, our diagnostic test for determining whether a metaanalysis is out of date had sensitivity of 55%, 62%, and 49% with corresponding specificity of 85%, 80%, and 90% respectively.
Conclusions
Simulations suggest that our methods are able to detect outofdate metaanalyses. These quick and approximate methods show promise for use by systematic review teams to help decide whether to commit the considerable resources required to update a metaanalysis. Further investigation and evaluation of the methods is required before they can be recommended for general use.
Background
Systematic reviews, particularly of randomized trials, have gained an important foothold within the evidencebased movement. One of their advantages is easy to understand: clinicians have little time to read individual reports of randomized trials. A more efficient use of their time is to review the totality of evidence pertaining to the question under consideration. A systematic review meets this need particularly if it is based on uptodate evidence. How can it be determined whether the evidence is up to date? Simply examining the publication date of a systematic review is not a dependable guide. Some fields evolve rapidly while others progress at a more gradual pace. Delays in the editorial process of traditional print journals can even render a systematic review outdated from the moment it is published.
While the advantage of a published systematic review clearly starts to diminish as new eligible studies are identified, the precise trajectory of decreasing utility is unknown [1]. We believe that quantitative methods are needed to help clinicians and systematic review teams determine when a systematic review can be considered up to date. In this paper, we address a narrower objective. First, we restrict our attention to systematic reviews that feature quantitative synthesis of evidence, or metaanalysis. Second, we focus only on metaanalyses whose pooled results are not statistically significant ("null" metaanalyses). Such results are not infrequent: estimates from two databases showed 38% [2] and 25% [3] of metaanalytic results were nonsignificant. Our goal is to identify metaanalyses whose pooled results would become statistically significant if they were updated.
As the body of published systematic reviews grows, the problem of outdatedness is attracting more attention. Several groups require that systematic reviews be kept up to date; for example reviews published in the Cochrane Library must be updated at least every 2 years[4]. However the Cochrane Reviewer's Handbook notes that "How often reviews need updating will vary depending on the production of valid new research evidence." Rather than prespecifying a fixed updating frequency, Ioannidis et al. [5] proposed "recalculating the results of a cumulative metaanalysis with each new or updated piece of information." However, in addition to raising statistical issues of multiple testing[6], this would require continuous monitoring and evaluation of the literature. The resource implications of either of these approaches could be substantial, and the Cochrane Reviewer's Handbook recommends the establishment, for each review, of "guides addressing when new research evidence is substantive enough to warrant a major update or amendment." A conceptual model for assessing the current validity of clinical practice guidelines was developed by Shekelle et al..[7] Their approach used focused literature searches together with expert input.
Here we propose a quantitative approach based on the notion that it may be possible to predict the amount of additional information required to change a nonsignificant metaanalytic result to a significant one. We measure this additional information in terms of the number of participants in primary studies. Our prediction can then be used to obtain a "diagnostic test" for outofdate metaanalyses.
This test presupposes that a primary metaanalytic result can be identified in each systematic review (analogous to a primary analysis in a clinical trial). Published systematic reviews often feature multiple metaanalyses and the authors do not always make it clear which one is of greatest interest.
We begin by deriving and explaining the prediction and then illustrating it using a metaanalysis of trials of intravenous streptokinase for acute myocardial infarction [8]. Next, we describe how the prediction is used to formulate a diagnostic test. A computer simulation is then used to study the performance of the test in three different configurations. Receiver operator characteristic (ROC) curves are derived to illustrate the results.
Methods
Predicting the number of additional participants required to obtain significance
Consider a metaanalysis involving a total of N participants that yields a pooled estimate with associated standard error SE_{N}. The zstatistic is defined to be the ratio
Z = /SE_{N}. (1)
When the absolute value of Z exceeds a critical value Z_{c }(for example Z_{c }= 1.96, corresponding to a twosided Type I error rate of 5%), the pooled estimate is judged to be statistically significant. Failure to obtain statistical significance may be due to the absence of any clinically meaningful effect or to a Type II error resulting from limited power. Here we focus on the latter case, namely pooled estimates that are nonsignificant due to limited power.
Although the power of a metaanalysis may depend on several factors, here we focus on sample size alone. Suppose that a total of N participants were included in the metaanalysis and that Z <Z_{c}. Now suppose that studies involving an additional n participants subsequently become available. Assuming there is no secular trend in the results, how large a value of n would be needed to obtain a statistically significant pooled estimate? In the appendix, we derive the following prediction, by extrapolating the results of the original metaanalysis:
n = N(/Z^{2 } 1). (2)
Note that the closer Z is to Z_{c}, the smaller the predicted n. As an illustration, consider Lau et al.'s cumulative metaanalysis of trials of intravenous streptokinase for acute myocardial infarction [8]. To definitively establish statistical significance, we will fix the probability of TypeI error (α) at 1%. If a metaanalysis of the first five trials (up to and including the second trial published in 1971) had been performed, the result would not have been statistically significant. Figure 1 illustrates the application of the prediction following this metaanalysis.
Figure 1. Cumulative metaanalysis of 9 trials of intravenous streptokinase for acute myocardial infarction [8]showing the application of the prediction (equation 2) using the results of the first 5 trials. The cumulative zstatistic is plotted versus the cumulative number of participants, shown on a square root scale (see Appendix). The horizontal dashed line indicates the twosided critical value of the zstatistic at α = 0.01. The diagonal dotted line emanates from the origin and passes through the point representing the metaanalysis of the first 5 trials (double circle). The location where this line intersects the horizontal dashed line is marked by a vertical dotted line projecting down to the horizontal axis and indicating the predicted number of participants required to obtain statistical significance.
One way to evaluate the quality of the prediction is to compare it with the total number of participants when statistical significance was actually obtained. This is not a perfect gauge because of the discrete sample sizes of the trials, but it can still provide insight. In Figure 1, we used the middle of three trials published in 1971 as the last trial in a hypothetical "original" metaanalysis. In that case, the predicted number of participants required to obtain significance (2506) is very close to the number when significance was actually obtained (2432). In fact, the marginal impact of each of these 1971 trials on the pooled estimate was substantial, which influences the predicted number of cases needed to demonstrate an effect. When the first of the three 1971 trials is instead used as the last trial in the original metaanalysis, the prediction (1232) substantially underestimates the number of participants required. On the other hand, when the last of the 1971 trials is used as the last trial in the original metaanalysis, the prediction (4230) is a substantial overestimate.
A diagnostic test
Suppose some time has elapsed since the completion of a metaanalysis (again, assumed to have shown a nonsignificant pooled result), and now additional evidence is available. Suppose the results of additional studies of the same clinical question are now available, involving a total of m participants. Is the metaanalysis out of date?
If the metaanalysis were in fact updated and the pooled result were statistically significant, the earlier metaanalysis could be deemed "out of date". Conversely, if the updated result were to remain nonsignificant, the earlier metaanalysis could be deemed "not out of date". Note that these designations should be understood as referring only to the narrow issue of statistical significance.
A simple decision is to "diagnose" the metaanalysis as being out of date if and only if m >n, or equivalently if and only if m/n > 1. For convenience we refer to m/n as the "new participant ratio", representing the ratio of the observed number of additional participants to the predicted number of additional participants to obtain statistical significance. Thus, the new participant ratio being greater than 1 provides a diagnostic test for whether a metaanalysis is out of date. Like any diagnostic test, falsepositive and falsenegative errors are possible, as illustrated in Table 1.
Table 1. Diagnostic test result versus true status of a metaanalysis
The sensitivity of the test is given by a/(a + c), while the specificity is given by d/(b + d). To obtain higher specificity, it is necessary to reduce the frequency of false positives. One way to do this is to tighten the criterion for declaring a metaanalysis to be out of date. For example, a metaanalysis could be diagnosed as out of date if and only if the new participant ratio is greater than q, for some q > 1. Conversely, to obtain higher sensitivity, a value q < 1 could be chosen. Any desired level of sensitivity (or specificity) could be obtained by choosing an appropriate value of q, with a corresponding tradeoff in specificity (or sensitivity). By varying q in this way, an ROC curve[9] is easily constructed. The area under the ROC curve[10] can be used to gauge the performance of the test.
A computer simulation
To assess the performance of the test, a series of simulations was conducted. The individual studies in the simulation were taken to be RCTs with binary outcomes. For simplicity, the log odds ratio was used to measure intervention effectiveness, and the MantelHaenszel estimator provided effect estimates. Three configurations were explored in the simulation study (Table 2), and are explained in detail below.
Table 2. Simulation configurations
Since many metaanalyses involve relatively few trials [11], the maximum number of trials per metaanalysis, denoted M, was taken to be either 10 or 20. Since the goal was to model metaanalyses with nonsignificant pooled estimates, the size of effects and the number of participants per arm were carefully chosen. Relatively strong effects were modelled (odds ratios of 1/2 and 2/3), so the expected number of participants per study arm, denoted E, could be relatively small.
For each configuration, the basic simulation procedure was as follows. In each step of the simulation, a number of trials was simulated. Events in the control arm of each trial were simulated using a binomial distribution with event rate sampled from a uniform distribution between 1% and 50%. Events in the treatment arm were simulated from an independent binomial distribution with event rate determined by the event rate in the control arm and the odds ratio. A metaanalysis of the trials was then performed. Our methods require a nonsignificant metaanalysis as a starting point, therefore if a simulated metaanalysis showed a significant pooled estimate, it was discarded and a new one was simulated. This was continued until a usable starting point was obtained. This process was repeated until a nonsignificant pooled estimate was obtained. The predicted number of additional participants required to obtain statistical significance, n, was then computed and recorded. Additional trials were then simulated and the total number of additional participants, m, was computed and recorded. The metaanalysis was then performed again, this time using the complete set of trials, and the statistical significance of the pooled estimate recorded. This was repeated 10000 times for each configuration. All tests of statistical significance were twosided, since this is customary in metaanalyses, and a significance level of α = 0.01 was used.
The number of trials and participants in the simulations were chosen as follows. The number of trials in the original metaanalysis was determined by randomly choosing with equal probability a number between 2 and M. The number of participants per study arm was determined by randomly sampling from a Poisson distribution with expected value E. The number of additional trials was determined by randomly choosing with equal probability a number between 1 and 1/2 M. (In our simulations, M was always even.) Again, the number of participants per study arm was determined by randomly sampling from a Poisson distribution with expected value E.
Results
By design, the zstatistics of the simulated metaanalyses were nonsignificant before updating. Figure 2 shows their distributions before and after updating.
Figure 2. Boxplots showing distribution of zstatistics for each simulation configuration before (A,B,C) and after (A*,B*,C*) updating. The midline of each box marks the median. The bottom and top of the box represent the 25th and 75th percentiles, respectively; thus the height of the box is the interquartile range (IQR). Values that lie more that 1.5 IQRs beyond the top or bottom of the box (outliers) are shown individually. Lines project from the top and bottom of the box to the most extreme values not more than 1.5 IQRs beyond the box.
Of the three simulation configurations, configuration C shows the lowest percentage of significant metaanalyses after updating (11%). This is because the odds ratio (2/3) is close to 1 and the number of trials in the original metaanalysis (10) is small.
One way to gauge the quality of predictions is to compare the new participant ratios of metaanalyses found to be out of date versus those not out of date. Figure 3 displays this comparison for simulation configuration C.
Figure 3. Boxplots comparing the new participant ratio for metaanalyses found to be out of date versus those not out of date in simulation configuration C. See caption of Figure 3 for interpretation of boxplots. Not shown are cases where the new participant ratio is infinite (7 cases out of 10000, 5 [71%] of which were in the outofdate group) or where the new participant ratio < 1/100 (1361 cases out of 10000, 6 [0.4%] of which were in the outofdate group).
The new participant ratios for the metaanalyses that are out of date are typically larger than those for the metaanalyses that are not out of date. This suggests that the metaanalyses with the largest new participant ratios may be the best candidates for updating. Indeed, of the 10 metaanalyses showing the largest new participant ratios in simulation configuration C, 8 were out of date, whereas overall 11% of the metaanalyses in that configuration were out of date.
If the goal is instead to decide whether a given nonsignificant metaanalysis is up to date, we can diagnose the metaanalysis as being out of date when the new participant ratio is greater than some threshold q, say q = 1. The results of this diagnostic test applied to simulation configuration C are shown in Table 3.
Table 3. Diagnostic test result with threshold q = 1 versus true status of metaanalyses in simulation configuration C.
The sensitivity in configuration C is thus 521/(521 + 539) = 49%, while the specificity is 8053/(8053 + 887) = 90%. By selecting different thresholds for the new participant ratio, the sensitivity and specificity can be varied, producing an ROC curve for each configuration (Fig. 4). The area under each ROC curve was computed as a gauge of the performance of the diagnostic test, by numerical integration using 10000 intervals of equal width. The areas were 0.81, 0.80, and 0.85 for configurations A, B, and C respectively. Hosmer and Lemeshow[12] suggest that for areas under the ROC curve between 0.7 and 0.8 the discrimination can be considered acceptable, while for areas between 0.8 and 0.9 the discrimination can be considered excellent.
Figure 4. ROC curves for the diagnostic test for each configuration. The dot on each curve indicates the sensitivity and 1specificity when the threshold for the new participant ratio is q = 1. For configuration A, when q = 1, the sensitivity is 55% and the specificity is 85%. For configuration B these values are 62% and 80%, and for configuration C they are 49% and 90%. Increasing q corresponds to moving counterclockwise along the curve. Note that a Bernoulli random decision with probability p that a review is out of date has sensitivity p and specificity 1  p, corresponding to an ROC curve consisting of a 45degree line. ROC curves above the 45degree line indicate performance superior to chance.
Discussion
The determination that a metaanalysis is not up to date or the decision to update a metaanalysis may be based on a variety of considerations. We believe that one of these should be the quantity of evidence that is newly available. The prediction and diagnostic rule developed here use a metric of the quantity of new information relative to the information previously synthesized. This allows for the fact that some fields evolve at an extremely rapid pace while others progress more gradually.
Our methods depend on very limited input, requiring only the number of participants included in the original metaanalysis, the zstatistic for the pooled estimate, and the number of additional participants in new studies. Furthermore, these quantities are easily obtained. The zstatistic is often reported or is easily computed from reported quantities such as pvalues, while the numbers of participants are often available in abstracts. It may also be possible to extract the numbers of participants automatically using pattern recognition software. An example of automatic identification and extraction of numerical values from text is the use of natural language queries to extract drug dosage information [13]. Another strength of our methods is that zstatistics are easily computed regardless of the measure of effectiveness used in the metaanalysis (e.g. odds ratio, standardized mean difference).
Rather than arbitrarily specifying a fixed frequency for updating metaanalyses, our methods provide empirical estimates of when metaanalyses need updating. Rosenthal[14] proposed that when metaanalysts obtain a statistically significant pooled estimate they also report "the tolerance for future null results" in the form of the number of nonsignificant studies that would be required to overturn their result. In this spirit, our prediction of the number of additional participants required to obtain statistical significance could be incorporated in the report of a metaanalysis with a nonsignificant pooled estimate as guidance for future studies or updates. Unlike approaches based on expert input, our methods are purely quantitative. Application of our methods may also help to avoid multiple testing issues associated with continuous updating of metaanalyses.
Simulations suggest that our methods are able to detect metaanalyses that would become significant if they were updated. In our three simulation configurations, our diagnostic test had sensitivity of 55%, 62%, and 49% with corresponding specificity of 85%, 80%, and 90% respectively. The test is easily modified to obtain higher sensitivity or specificity.
The new participant ratio can be used to identify metaanalyses that are the best candidates for updating. In one simulation, of the 10 metaanalyses showing the largest new participant ratios, 8 were out of date, whereas overall 11% of the metaanalyses were out of date. Given a collection of nonsignificant metaanalyses, a practical strategy might be to update the metaanalysis with the largest new participant ratio first, then move to the metaanalysis with the next largest new participant ratio, and so forth.
For health policy decision makers, such as the National Health Service, faced with having to make decisions with competition for diminishing budgets, our approach might help indicate which metaanalyses are more 'informative' in terms of what an updated metaanalysis would contribute.
Our methods show promise as a rough gauge of whether a metaanalysis is up to date. If a quick literature search uncovers several apparently relevant studies published subsequent to a metaanalysis, one might have reason to doubt the currency of the review. Calculation of the "new participant ratio" introduced here can make such judgments more precise and objective. Based on our results, a new participant ratio greater than 5 may be taken as strong evidence that a metaanalysis is out of date; conversely, a new participant ratio less than 1/5 suggests that the weight of evidence accumulated since the metaanalysis was published is unlikely to overturn its findings. Such judgments are, of course, conditional on the thoroughness of the literature search and relevance assessment of any additional studies. For example, a large new participant ratio could be misleading if, upon closer examination, several of the new studies were not in fact relevant to the clinical question.
Systematic review groups may choose to conduct a thorough literature search and relevance evaluation even before committing to update a review. In such cases, less stringent thresholds for the new participant ratio may be acceptable. For example, a new participant ratio greater than 2 could be taken as strong evidence that a metaanalysis needs to be updated; conversely, a new participant ratio less than 1/2 would suggest that the metaanalysis does not need to be updated.
An alternative approach to our method is possible using considerations of clinical significance. Equation (9) in the Appendix could be modified to use an effect judged to be clinically important rather than the effect observed in the original metaanalysis. This would lead to a different prediction of the number of additional participants required to obtain statistical significance. A metaanalyst might wish to perform an update when this number of participants had accrued.
It is important to note that our methods have some limitations. First, they are based on the assumption that the variance of the pooled estimates shrinks at a rate inversely proportional to the total number of participants in all studies. Since the variance of pooled estimates is in general determined not only by sample size, but also by factors such as baseline risk and statistical heterogeneity, this assumption may not hold exactly. Second, our methods are based on the assumption that there is no secular trend in the results. In particular cases this assumption may be violated. Third, particularly when the number of participants in the studies included in the original metaanalysis is small, our methods may give highly variable results. Fourth, while simulations suggest that our diagnostic test has good overall performance at determining whether an update is warranted for a given metaanalysis, it is not clear what an appropriate choice of the q parameter would be. If the true shape of the ROC curve were known then q could be selected to obtain a desired level of sensitivity or specificity. Finally, a realworld decision to update a metaanalysis would likely be guided not only by the quantitative methods proposed here, but also by clinical and scientific considerations.
It might be argued that our methods place too much emphasis on the narrow issue of statistical significance. Indeed, there has recently been a shift away from hypothesis testing towards estimation, with an accompanying change of emphasis from pvalues to confidence intervals[15]. Given large enough sample sizes, it is always possible to obtain statistical significance. Nevertheless, acceptance of new therapeutic interventions continues to depend at least in part on statistical significance, and we believe that this is not without merit. A logical next step following from our methods would be a framework for incorporating considerations of clinical significance into the decisionmaking process.
Conclusions
This work has focused on metaanalyses that fail to show a significant effect, but would do so if updated. The methods we have developed show promise for use by individual clinicians to help determine whether metaanalyses are up to date, and for use by systematic review teams to help decide whether to commit the considerable resources required to update a metaanalysis. Further investigation and evaluation of the methods is required before they can be recommended for general use.
Competing interests
None declared.
Authors' contributions
MS initially conceived of the study and NJB derived the prediction rule. NJB and MS secured funding for the project. NJB, MF, MS, and DM participated in the design of the study. NJB coordinated the project and wrote all manuscript drafts. DM reviewed simulation results, and participated in the drafting and revision of the manuscript. MS participated in the drafting and revision of the manuscript. MF wrote the simulation and data analysis programs and produced the figures. All authors read and approved the final manuscript.
Appendix
Recall that the standard error of a sample mean is inversely proportional to the square root of the sample size. Many estimators have the same property, and we will therefore suppose that the standard error SE_{N }of the pooled estimate is inversely proportional to the square root of the number of participants, N. Say for some σ > 0,
SE_{N }= σ/. (5)
While this has the same form as the standard error of a sample mean where σ represents the standard deviation, it should be noted that here σ is simply a constant of proportionality. From equations (1) and (5),
σ = /Z. (6)
Suppose additional studies become available yielding n additional participants. From equation (5), the standard error of the pooled estimate based on all N + n participants is
SE_{N+n }= σ/. (7)
Substituting equation (6) into equation (7) gives
Furthermore, suppose there is no secular trend in the results, i.e. the pooled estimate based on all N + n participants remains the same, but now it just reaches statistical significance, i.e.
Substituting equation (8) into equation (9) gives
Solving for n gives
n = N(/Z^{2 } 1). (11)
This provides a prediction of the number of additional participants required to warrant updating a metaanalysis. Note that the prediction depends only on N (the number of participants in the original metaanalysis), Z (the zstatistic from the original metaanalysis), and Z_{c }(the critical zvalue chosen for statistical significance). Crucially, note that the results of additional trials are not required.
The form in which Z enters equation (11) has important consequences. First, note that as Z → 0 from either the right or the left, n → ∞, and strictly speaking when Z = 0, n is not defined. With discrete data (e.g. data from randomized controlled trails with binary outcomes), a pooled zstatistic of zero can occur with nonzero probability. For convenience, we therefore define n = ∞ when Z = 0. Second, note that since Z enters equation (11) as Z^{2}, the sign of Z is irrelevant. In a metaanalysis of randomized controlled trials, for example, equation (11) gives the same prediction regardless of whether the results favour the intervention or control group. It is therefore not necessary to interpret the sign of the zstatistic.
Geometrical interpretation
A geometrical interpretation provides a more intuitive way to understand the prediction. Equation (10) is a linear equation in with slope Z/ and intercept 0. In other words, on a graph of the cumulative zstatistic versus the cumulative sample size shown on a squareroot scale, the total number of participants predicted is the intersection of a horizontal line at Z_{c }with a line that passes through the origin (0,0) and the point defined by the original metaanalysis (Fig. 5). From equation (6), the slope of the line can be expressed as /σ. The straightline extrapolation reflects the assumptions that both the pooled estimate, , and the constant of proportionality for the standard error, σ, remain unchanged as the number of participants increases.
Figure 5. Geometrical illustration of the prediction. The number of additional participants, n, is determined by extrapolating a line segment starting at the origin (0,0) and passing through the point (, Z) to the point where it intersects with a horizontal line at Z_{c}.
Acknowledgements
We gratefully acknowledge the helpful feedback provided by Paul Shekelle, Howard Schachter, and Robert Platt. We are also grateful to Khaled El Emam, who contributed to the conceptualization of the project. The CHEO Research Institute and the Natural Sciences and Engineering Research Council of Canada provided financial support.
References

Stead LF, Lancaster T, Silagy CA: Updating a systematic review – what difference did it make? Case study of nicotine replacement therapy.
BMC Med Res Methodol 2001, 1:10. PubMed Abstract  BioMed Central Full Text

McAuley L, Pham B, Tugwell P, Moher D: Does the inclusion of grey literature influence estimates of intervention effectiveness reported in metaanalyses?
Lancet 2000, 356:12281231. PubMed Abstract  Publisher Full Text

Sampson M, Barrowman NJ, Moher D, Klassen TP, Pham B, Platt R, St. John PD, Viola R, Raina P: Should systematic reviewers search Embase in addition to Medline?

Clarke M, Oxman AD (Eds): Cochrane Reviewers Handbook 4.1.6 [updated January 2003] In In The Cochrane Library. Oxford: Update Software; 2003.

Ioannidis JP, ContopoulosIoannidis DG, Lau J: Recursive cumulative metaanalysis: a diagnostic for the evolution of total randomized evidence from group and individual patient data.
J Clin Epidemiol 1999, 52:281291. PubMed Abstract  Publisher Full Text

Pogue JM, Yusuf S: Cumulating evidence from randomized trials: utilizing sequential monitoring boundaries for cumulative metaanalysis.
Control Clin Trials 1997, 18:580593. PubMed Abstract  Publisher Full Text

Shekelle PG, Ortiz E, Rhodes S, Morton SC, Eccles MP, Grimshaw JM, Woolf SH: Validity of the Agency for Healthcare Research and Quality clinical practice guidelines: how quickly do guidelines become outdated?
JAMA 2001, 286:14611467. PubMed Abstract  Publisher Full Text

Lau J, Antman EM, JimenezSilva J, Kupelnick B, Mosteller F, Chalmers TC: Cumulative metaanalysis of therapeutic trials for myocardial infarction.
N Engl J Med 1992, 327:248254. PubMed Abstract

Hanley JA: Receiver operating characteristic (ROC) methodology: the state of the art.
Crit Rev Diagn Imaging 1989, 29:307335. PubMed Abstract

Hanley JA, McNeil BJ: The meaning and use of the area under a receiver operating characteristic (ROC) curve.
Radiology 1982, 143:2936. PubMed Abstract

Moher D, Cook DJ, Jadad AR, Tugwell P, Moher M, Jones A, Pham B, Klassen TP: Assessing the quality of reports of randomised trials: implications for the conduct of metaanalyses.
Health Technol Assess 1999, 3:i98. PubMed Abstract  Publisher Full Text

Hosmer DW, Lemeshow S: Applied Logistic Regression. 2nd edition. New York: Wiley; 2000.

Croft WB, Callan JP, Aronow DB: Effective access to distributed heterogeneous medical text databases.
Medinfo 1995, 8(Pt 2):1719. PubMed Abstract

Rosenthal R: The "file drawer problem" and tolerance for null results.
Psychological Bulletin 1979, 86:638641. Publisher Full Text

Gardner MJ, Altman DG: Confidence intervals rather than P values: estimation rather than hypothesis testing.
Br Med J (Clin Res Ed) 1986, 292:746750. PubMed Abstract
Prepublication history
The prepublication history for this paper can be accessed here: