Skip to main content

Systematic quantitative overviews of the literature to determine the value of diagnostic tests for predicting acute appendicitis: study protocol

Abstract

Background

Suspected acute appendicitis is the most frequent cause for emergency operations in visceral surgery worldwide. In approximately twenty percent of all cases however, the diagnosis is incorrect and patients undergo surgery without having acute appendicitis. Operations of bland appendices put patients at risk and entail a serious waste of resources. Several highly accurate tests have been introduced to diagnose acute appendicitis. The false positive rate however, has not changed over the last twenty years. Given the variation that exists in both practice and research, the uncertainty regarding the quality of the underlying evidence, there is a clear need for comprehensive, systematic and quantitative overviews of the diagnostic value of the various tests purported to be predictive of acute appendicitis.

Methods

Literature will be identified searching general bibliographic databases (MEDLINE and EMBASE), specialist computer databases (DARE, Cochrane Database of Systematic Reviews, conference proceedings, MEDION, SCISEARCH, BIOSIS) without language restrictions. We will contact experts and the manufacturers of tests. Hand-searching will complete our searches. Identified articles will be selected according to populations, tests, outcomes and study design. Papers meeting the selection criteria will be appraised to rate their methodological quality. Analysis will include exploration of heterogeneity in results. We will conduct meta-analyses to generate summary estimates of test accuracy measures and summary ROC curves where appropriate. If meta-analysis is considered to be inappropriate, we will describe the identified evidence in the context of appraised quality.

Discussion

These reviews should lead to formulation of recommendations for current practice and future research.

Peer Review reports

Introduction

Suspected acute appendicitis is the most frequent cause for emergency operations in visceral surgery worldwide. In the UK 37,289 patients had an emergency excision of the appendix in the year 2000 [1]. In approximately twenty percent of all cases however, the diagnosis is incorrect and patients undergo surgery without having acute appendicitis at all [2–5]. Operations of bland appendices may lead to morbidity in 4.6 percent [6] and to mortality in 0.14 percent [6] of cases. Despite the introduction of reports of highly accurate diagnostic procedures for the diagnosis of acute appendicitis a big retrospective cohort study [7] concluded that the rate of misdiagnosis (the false positive rate) has not changed over the last twenty years. One potential explanation of that finding might be, that studies reporting on test accuracy overestimate the true potential of correct classification due to inappropriate methodology and bias of reported results since primary research on evaluation of tests is generally poor in quality [8–10].

Online searches of the electronic databases revealed a number of broad reviews, commentaries and recommendations on tests for predicting acute appendicitis but there was a dearth of focused, rigorous diagnostic overviews of the available evidence. These publications showed that there are several prediction rules and tests or markers purported to be predictive of acute appendicitis. However, they offer only limited guidance for practice because traditional literature reviews evaluating tests for acute appendicitis have not applied the scientific strategies to assemble, appraise, and synthesize relevant evidence, which have been embodied in the criteria for high quality reviews.

Given the variation that exists in both practice and research, the uncertainty regarding the quality of the underlying evidence, and the importance of early prediction of acute appendicitis in view of the available effective treatments, there is a clear need for a comprehensive, systematic and quantitative overview of the diagnostic value of the various tests purported to be predictive of acute appendicitis.

At present there is a dearth of such reviews and in this commentary, we will describe how we are using such a systematic approach to collate and critically appraise the available literature in the diagnosis of acute appendicitis.

Methods

Study identification

Non-comprehensive search strategies can lead to significant bias in the retrieval of relevant literature. This weakens the strength of inferences from systematic reviews and poses a particular problem in reviews of diagnostic tests [11, 12]. Therefore we will identify literature via general bibliographic databases including MEDLINE and EMBASE, specialist computer databases such as DARE and MEDION (a database of diagnostic test reviews set up by Dutch and Belgian researchers), the Cochrane Database of Systematic Reviews, relevant specialist registers of the Cochrane Collaboration, conference proceedings and BIOSIS without language restrictions. In addition we will contact individual experts and those with an interest in this field to uncover grey literature and we will contact the manufacturers of tests. Hand-searching of selected specialist journals, checking of reference lists and SCISEARCH to identify frequently cited articles will complete our searches. In cases of duplicate publication, the most recent and complete versions will be selected. A comprehensive database of relevant articles will be constructed – a preliminary search has been carried out in order to estimate the size of the relevant literature. MEDLINE Searches located 800 potentially relevant citations. Expanding search to other databases, hand searching, reference list searching and or contact with authors might add another 100% citations, so the total is likely to be 1600. Letters will be sent to major centres and the first author of each shortlisted selected paper published in the last five years, asking them whether they know of any published or unpublished relevant studies not included on our list. The search strategy used to identify articles in MEDLINE is shown in: appendix.doc.

Study selection

Studies will be selected for inclusion in the review in a two-stage process using the selection criteria based on those shown in Table 1. First, a comprehensive database of the literature search will be constructed. The citations will be scrutinised by two reviewers to obtain copies of full manuscripts of all citations that are likely to meet the selection criteria. Two reviewers will then independently select the studies, which meet predefined, and explicit criteria regarding populations, tests, outcomes and study design. These criteria will be pilot tested using a sample of papers and agreement between reviewers will be measured. When disagreements occur the two reviewers will meet. Experience suggests that often the cause of the disagreement is a simple oversight on the part of one of the reviewers. When this is not the case the issue will be resolved by consensus involving a third reviewer.

Table 1 Study Selection Criteria.

Study validation

Papers meeting the selection criteria will be appraised to rate their methodological quality. In addition to using ratings of study quality as possible explanations for differences in results, the extent to which primary research met methodological standards is important per se for assessing the strength of any conclusions that are reached. There is an ongoing debate over what constitutes the best quality assessment tool for diagnostic test studies. We will evaluate elements of study design, which are likely to have a direct relationship to bias in a diagnostic test study [10][13][14][15]. The items shown in Table 2 will be used for methodological quality assessment. Agreement for the quality assessments will be calculated, and disagreement resolved, in the same fashion as for the assessment of study selection. We will evaluate the agreement between the two reviewers using percentage agreement and weighted kappa statistics [16].

Table 2 Criteria for study validation.

Data collection

The extraction of study's findings will be conducted in duplicate using a pre-designed and piloted data extraction form to avoid any errors. Given the extent of insufficient reporting in the medical literature, we propose to obtain missing information from investigators whenever possible. It is otherwise impossible to distinguish between what was done but not reported and what was not done. A template of data extraction form is shown in: appendix.doc.

Analysis

By analysis we mean synthesis of results from individual studies (meta-analysis), and exploration of variation in results from study to study (heterogeneity) and generation of the most useful combination of tests. We will conduct meta-analyses to generate summary estimates of sensitivities, specificities, predictive values, likelihood ratios (LRs) and receiver operating characteristic (ROC) curves where appropriate [13, 14, 17]. If meta-analysis is considered to be inappropriate, we will describe the identified evidence in the context of appraised quality. If a meta-analysis is considered appropriate, we will examine the correlation between true positive rates and false positive rates in individual studies. If the correlation is poor, we will use LR as the main accuracy measure. If we find a correlation then we will generate a summary ROC curve [18] in addition to pooling of LRs. Many authorities considered this the preferred method of pooling test results from primary studies [13, 14, 17]. The summary ROC plot provides a way of summarising the performance of a test from the results of several studies over a range of test thresholds. However, our preference for LRs is based on the published recommendations that LRs are more clinically meaningful as measures of diagnostic accuracy [15]. Our experience has been that the true positive rates and false positive rates in individual studies are poorly correlated in which case it is not feasible to generate a summary ROC curve. Moreover, when the outcome of a test is of binary nature (positive or negative) LRs are more clinically meaningful than ROC curves. One disadvantage of analysis using LR is that it generates two measures for each test, one for a positive result and another for a negative result. A ratio of LRs will be used to generate a single measure called diagnostic odds ratio, which is more suitable for statistical analysis. For the purpose of meta-analysis, we will weight the logLR from each study in inverse proportion to its variance in order to combine the LRs from each study. To demonstrate the practical application of the summary LRs generated, we will calculate posttest probabilities for acute appendicitis using Bayes' theorem. An estimate of the pretest probability will be obtained by calculating the prevalence of the outcome event in the population studied. The following algorithm of equations will be used for calculating post-test probability:

pretest probability = prevalence of acute appendicitis

pretest odds = pretest probability / (1 – pretest probability)

posttest odds = likelihood ratio × pretest odds

posttest probability = posttest odds / (1 + posttest odds)

In order to deal with the uncertainty of the estimate, we will generate 95% confidence intervals around the point estimate. Approximate variance for the posttest odds will be obtained by adding the variances of the combined LRs and pretest odds, enabling the calculation of its 95% confidence intervals. The 95% confidence intervals for the posttest probabilities will then be generated by converting the limits of the posttest odds to their respective probabilities.

Heterogeneity of results between different studies will be formally assessed using the Breslow-Day test which compares for each study the ratio of the odds of having the outcome of interest when the test result is positive to the odds of having the same outcome when the test result is negative[19]. To explore causes of heterogeneity in the estimates of diagnostic accuracy of the tests for acute appendicitis, we will conduct a sensitivity analysis. This will be carried out by subgroup analyses to see whether variations in population, intervention, outcomes and study quality will affect the estimate of diagnostic accuracy. Results of pooled analyses will be provided within cogent patient groups.

Discussion

In summary, systematic reviews of diagnostic literature to predict acute appendicitis allow us to assess the quality of the available evidence and to identify specific tests (including history, physical examination and tests) that have diagnostic value. These reviews should lead to formulation of recommendations for current practice and future research. Just as an evidence-based culture in delivery of health care has been supported by systematic reviews of literature on therapeutic interventions, we can expect to see an extension of this approach in the area of care involving use of diagnostic and screening tests.

References

  1. Hospital Episode Statistics Department of Health Available at:. 2001, [http://www.doh.gov.uk/hes/standard_data/available_tables/total_operations/tb01099a.pdf]

  2. Reynolds SL: Missed appendicitis in a pediatric emergency department. Pediatr Emerg Care. 1993, 9: 1-3.

    Article  CAS  PubMed  Google Scholar 

  3. Rothrock SG, Skeoch G, Rush JJ, Johnson NE: Clinical features of misdiagnosed appendicitis in children. Ann Emerg Med. 1991, 20: 45-50.

    Article  CAS  PubMed  Google Scholar 

  4. Rothrock SG, Green SM, Dobson M, Colucciello SA, Simmons CM: Misdiagnosis of appendicitis in nonpregnant women of childbearing age. J Emerg Med. 1995, 13: 1-8. 10.1016/0736-4679(94)00104-9.

    Article  CAS  PubMed  Google Scholar 

  5. McCallion J, Canning GP, Knight PV, McCallion JS: Acute appendicitis in the elderly: a 5-year retrospective study. Age Ageing. 1987, 16: 256-260.

    Article  CAS  PubMed  Google Scholar 

  6. Velanovich V, Satava R: Balancing the normal appendectomy rate with the perforated appendicitis rate: implications for quality assurance. Am Surg. 1992, 58: 264-269.

    CAS  PubMed  Google Scholar 

  7. Flum DR, Morris A, Koepsell T, Dellinger EP: Has misdiagnosis of appendicitis decreased over time? A population-based analysis. JAMA. 2001, 286: 1748-1753. 10.1001/jama.286.14.1748.

    Article  CAS  PubMed  Google Scholar 

  8. Sheps SB, Schechter MT: The assessment of diagnostic tests. A survey of current medical research. JAMA. 1984, 252: 2418-2422. 10.1001/jama.252.17.2418.

    Article  CAS  PubMed  Google Scholar 

  9. Reid MC, Lachs MS, Feinstein AR: of methodological standards in diagnostic test research. Getting better but still not good. JAMA. 1995, 274: 645-651. 10.1001/jama.274.8.645.

    Article  CAS  PubMed  Google Scholar 

  10. Lijmer JG, Mol BW, Heisterkamp S, Bonsel GJ, Prins MH, van de Meulen JH: Empirical evidence of design-related bias in studies of diagnostic tests. JAMA. 1999, 282: 1061-1066. 10.1001/jama.282.11.1061.

    Article  CAS  PubMed  Google Scholar 

  11. Irwig L, Macaskill P, Glasziou P, Fahey M: Meta-analytic methods for diagnostic test accuracy. J Clin Epidemiol. 1995, 48: 119-130. 10.1016/0895-4356(94)00099-C.

    Article  CAS  PubMed  Google Scholar 

  12. Vamvakas EC: Meta-analyses of studies of the diagnostic accuracy of laboratory tests: a review of the concepts and methods. Arch Pathol Lab Med. 1998, 122: 675-686.

    CAS  PubMed  Google Scholar 

  13. Cochrane Methods Group on Systematic Review of Screening and Diagnostic Tests: Recommended Methods, last updated on 9 February 1998. 1996, [http://www.cochrane.org/cochrane/sadtddoc1.htm]

  14. Irwig L, Tosteson AN, Gatsonis C, Lau J, Colditz G, Chalmers TC: Guidelines for meta-analyses evaluating diagnostic tests. Ann Intern Med. 1994, 120: 667-676.

    Article  CAS  PubMed  Google Scholar 

  15. Jaeschke R, Guyatt G, Sackett DL: Users' guides to the medical literature. III. How to use an article about a diagnostic test. A. Are the results of the study valid? Evidence-Based Medicine Working Group. JAMA. 1994, 271: 389-391. 10.1001/jama.271.5.389.

    Article  CAS  PubMed  Google Scholar 

  16. Cohen J: A coefficient of agreement for nominal scales. Educ.Psychol.Meas. 1960, 20: 27-46.

    Article  Google Scholar 

  17. Midgette AS, Stukel TA, Littenberg B: A meta-analytic method for summarizing diagnostic test performances: receiver-operating-characteristic-summary point estimates. Med Decis Making. 1993, 13: 253-257.

    Article  CAS  PubMed  Google Scholar 

  18. Moses LE, Shapiro D, Littenberg B: Combining independent studies of a diagnostic test into a summary ROC curve: data-analytic approaches and some additional considerations. Stat Med. 1993, 12: 1293-1316.

    Article  CAS  PubMed  Google Scholar 

  19. Breslow NE, Day NE: Statistical methods in cancer research. Volume I – The analysis of case-control studies. IARC Sci Publ. 1980, 5-338.

    Google Scholar 

Pre-publication history

Download references

Acknowledgements

The authors would like to thank Gill Richie and Julie Glanville of the Centre for Reviews and Dissemination in York (UK) for searching the databases.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lucas M Bachmann.

Additional information

Competing interests

none declared

Authors' Contributions

LMB and JS initiated the project and wrote the protocol. DBB, SAB, MGB and FMO screened the pilot searches, all authors commented on earlier drafts and approved the final manuscript.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bachmann, L.M., Bischof, D.B., Bischofberger, S.A. et al. Systematic quantitative overviews of the literature to determine the value of diagnostic tests for predicting acute appendicitis: study protocol. BMC Surg 2, 2 (2002). https://doi.org/10.1186/1471-2482-2-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1471-2482-2-2

Keywords