Email updates

Keep up to date with the latest news and content from BMC Health Services Research and BioMed Central.

Open Access Highly Accessed Research article

Systems for grading the quality of evidence and the strength of recommendations I: Critical appraisal of existing approaches The GRADE Working Group

David Atkins1, Martin Eccles2, Signe Flottorp3, Gordon H Guyatt4, David Henry5, Suzanne Hill5, Alessandro Liberati6, Dianne O'Connell7, Andrew D Oxman3, Bob Phillips8, Holger Schünemann49, Tessa Tan-Torres Edejer10, Gunn E Vist3*, John W Williams11 and The GRADE Working Group3

Author affiliations

1 Center for Practice and Technology Assessment, Agency for Healthcare Research and Quality, 540 Gaither Rd. Rokville, MD 20852, USA

2 Centre for Health Services Research, University of Newcastle upon Tyne, 21 Claremont Place, Newcastle upon Tyne NE2 4AA, UK

3 Informed Choice Research Department, Norwegian Health Services Research Centre, Pb. 7004 St. Olavs Plass, 0130 Oslo, Norway

4 Departments of Clinical Epidemiology and Biostatistics and Medicine, McMaster University, 1200 Main Street West, Hamilton, Ontario L8N 3Z5, Canada

5 Department of Clinical Pharmacology, Faculty of Medicine and Health Sciences, University of Newcastle, Level 5, New Med 2 Building, Newcastle Mater Hospital, Waratah, NSW 2298, Australia

6 Department of Oncology and Hematology, Università di Modena e Reggio Emilia, Azienda Ospedaliera Policlinico, Via dal Pozzo 41, 41100 Modena, Italia and Centro per la Valutazione della Efficacia della Assistenza Sanitaria (CeVEAS), Modena, Italy

7 Cancer Epidemiology Research Unit, Cancer Research and Registers Division, The Cancer Council NSW, PO Box 572, Kings Cross NSW 1340, Australia

8 Centre for Evidence-based Medicine, University Department of Psychiatry, Warneford Hospital, Oxford OX3 7JX, UK

9 Departments of Medicine and Social & Preventive Medicine, University at Buffalo, State University of New York, ECMC-CC142, 462 Grinder St, Buffalo, NY 14215, USA

10 Global Programme on Evidence for Health Policy, World Health Organisation, CH-1211 Geneva 27, Switzerland

11 The Center for Health Services Research in Primary Care, HSR&D, Department of Veterans Affairs Medical Center and Duke University Medical Center, 508 Fulton St., Durham, NC 27705, USA

For all author emails, please log on.

Citation and License

BMC Health Services Research 2004, 4:38  doi:10.1186/1472-6963-4-38

The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1472-6963/4/38


Received:23 January 2004
Accepted:22 December 2004
Published:22 December 2004

© 2004 Atkins et al; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

A number of approaches have been used to grade levels of evidence and the strength of recommendations. The use of many different approaches detracts from one of the main reasons for having explicit approaches: to concisely characterise and communicate this information so that it can easily be understood and thereby help people make well-informed decisions. Our objective was to critically appraise six prominent systems for grading levels of evidence and the strength of recommendations as a basis for agreeing on characteristics of a common, sensible approach to grading levels of evidence and the strength of recommendations.

Methods

Six prominent systems for grading levels of evidence and strength of recommendations were selected and someone familiar with each system prepared a description of each of these. Twelve assessors independently evaluated each system based on twelve criteria to assess the sensibility of the different approaches. Systems used by 51 organisations were compared with these six approaches.

Results

There was poor agreement about the sensibility of the six systems. Only one of the systems was suitable for all four types of questions we considered (effectiveness, harm, diagnosis and prognosis). None of the systems was considered usable for all of the target groups we considered (professionals, patients and policy makers). The raters found low reproducibility of judgements made using all six systems. Systems used by 51 organisations that sponsor clinical practice guidelines included a number of minor variations of the six systems that we critically appraised.

Conclusions

All of the currently used approaches to grading levels of evidence and the strength of recommendations have important shortcomings.

Keywords:
evidence-based health care; levels of evidence; practice guidelines; strength of recommendation; systematic reviews

Background

In 1979 the Canadian task Force on the Periodic Health Examination published one of the first efforts to explicitly characterise the level of evidence underlying healthcare recommendations and the strength of recommendations [1]. Since then a number of alternative approaches has been proposed and used to classify clinical practice guidelines [2-28].

The original approach used by the Canadian Task Force was based on study design alone, with randomised controlled trials (RCTs) being classified as good (level I) evidence, cohort and case control studies being classified as fair (level II) evidence and expert opinion being classified as poor (level III) evidence. The strength of recommendation was based on the level of evidence with direct correspondence between the two; e.g. a strong recommendation (A) corresponded to there being good evidence. A strength of the original Canadian Task Force approach was that it was simple; the main weakness was that it was too simple. Because of its simplicity, it was easy to understand, apply and present. However, because it was so simple there were many implicit judgements, including judgements about the quality of RCTs, conflicting results of RCTs, and convincing results from non-experimental studies.

For example:

• Should a small, poorly designed RCT be considered level I evidence?

• Should RCTs with conflicting results still be considered level I evidence?

• Should observational studies always be considered level II evidence, regardless of how convincing they are?

The original approach by the Canadian Task Force also did not include explicit judgements about the strength of recommendations, such as how trade-offs between the expected benefits, harms and costs were weighed and taken account of in going from an assessment of how good the evidence is to determining the implications of the results for practice.

The GRADE Working Group is an informal collaboration of people with an interest in addressing shortcomings such as these in systems for grading evidence and recommendations. We describe here a critical appraisal of six prominent systems and the results of the critical appraisal.

Methods

We selected systems for grading the level of evidence and the strength of recommendations that we considered prominent and that included features not captured by other prominent systems. These were selected based on the experience and knowledge of the authors through informal discussion. A description of the most recent version (as of summer 2000) of each of these systems (Appendix 1 to 6), was prepared by one of the authors familiar with the system, and used in this exercise. The following six systems were appraised: the American College of Chest Physicians (ACCP, [see 1]) [21], Australian National Health and Medical Research Council (ANHMRC, [see 2]) [17], Oxford Centre for Evidence-Based Medicine (OCEBM, [see 3]) [16], Scottish Intercollegiate Guidelines Network (SIGN, [see 4]) [18], US Preventive Services Task Force (USPSTF, [see 5]) [22], US Task Force on Community Preventive Services (USTFCPS, [see 6]) [25].

Additional File 1. American College of Chest Physicians (ACCP), a brief description of the ACCP approach.

Format: DOC Size: 49KB Download file

This file can be viewed with: Microsoft Word ViewerOpen Data

Additional File 2. Australian National Health and Medical Research Council (ANHMRC), a brief description of the ANHMRC approach.

Format: DOC Size: 38KB Download file

This file can be viewed with: Microsoft Word ViewerOpen Data

Additional File 3. Oxford Centre for Evidence-based Medicine (OCEBM), a brief description of the OCEBM approach.

Format: DOC Size: 73KB Download file

This file can be viewed with: Microsoft Word ViewerOpen Data

Additional File 4. Scottish Intercollegiate Guidelines (SIGN), a brief description of the SIGN approach.

Format: DOC Size: 27KB Download file

This file can be viewed with: Microsoft Word ViewerOpen Data

Additional File 5. U.S. Preventive Services Task Force (USPSTF), a brief description of the USPSTF approach.

Format: DOC Size: 51KB Download file

This file can be viewed with: Microsoft Word ViewerOpen Data

Additional File 6. U.S. Task Force on Community Preventive Services (USTFCPS), a brief description of the USTFCPS approach.

Format: DOC Size: 82KB Download file

This file can be viewed with: Microsoft Word ViewerOpen Data

These descriptions of the systems were given to the twelve people who independently appraised the six systems, all of the authors minus GEV appraised the six systems, three of the authors (DH, SH and DO'C) appraised as a group and reported as one (see contributions). The 12 assessors all had experience with at least one system and most had helped to develop one of the six included systems. Twelve criteria described by Feinstein [29] provided the basis for assessing the sensibility of the six systems.

Criteria used to assess the sensibility of systems for grading evidence and recommendations

1. To what extent is the approach applicable to different types of questions? -effectiveness, harm, diagnosis and prognosis (No, Not sure, Yes)

2. To what extent can the system be used with different audiences? -patients, professionals and policy makers (Little extent, Some extent, Large extent)

3. How clear and simple is the system? (Not very clear, Somewhat clear, Very clear)

4. How often will information not usually available be necessary? (Often, Sometimes, Seldom)

5. To what extent are subjective decisions needed? (Often, Sometimes, Seldom)

6. Are dimensions included that are not within the construct (level of evidence or strength of recommendation)? (Yes, Partially, No)

7. Are there important dimensions that should have been included and are not? (No, Partially, Yes)

8. Is the way in which the included dimensions are aggregated clear and simple? (No, Partially, Yes)

9. Is the way in which the included dimensions are aggregated appropriate? (No. Partially, Yes)

10. Are the categories sufficient to discriminate between different levels of evidence and strengths of recommendations? (No, Partially, Yes)

11. How likely is the system to be successful in discriminating between high and low levels of evidence or strong and weak recommendations? (Not very likely, Somewhat likely, Highly likely)

12. Are assessments reproducible? (Probably not, Not sure, Probably)

No training was provided and we did not discuss the 12 criteria prior to applying them to the six systems.

Our independent appraisal of the six systems were summarised and discussed. The discussion focused on differences in the interpretation of the criteria, disagreement about the judgements that we made and sources of these disagreements, the strengths and weaknesses of the six systems, and inferences based on the appraisals and subsequent discussion.

In order to identify important systems that we might have overlooked following our appraisal of these six systems we also searched the US Agency for Health Care Research and Quality (AHRQ) National Guidelines Clearing House for organisations that have graded two or more guidelines in the Clearing House using an explicit system [30]. These systems were compared with the six systems that we critically appraised.

Results

There was poor agreement among the 12 assessors who independently assessed the six systems. A summary of the assessments of the sensibility of the six approaches to rating levels of evidence and strength of recommendation is shown in Table 1.

Table 1. Summary of assessments of the sensibility of six approaches to rating levels of evidence and strength of recommendation

Discussion

The poor agreement among the assessors likely reflects several factors. Some of us had practical experience using one of the systems or used additional background information related to one or more grading systems, and we may have been biased in favour of the system with which we were most familiar. Each criterion was applied to grading both evidence and recommendations. Some systems were better for one of these constructs than the other and we may have handled these discrepancies differently. In addition each criterion may have been assessed relative to different judgements about the evidence, such as an assessment of the overall quality of evidence for an important outcome (across studies) versus the quality of an individual study. Some of the criteria were not clear and were interpreted or applied inconsistently. For example, a system might be clear and not simple or visa versa. We likely differed in how stringently we applied the criteria. Finally, there was true disagreement.

There was agreement that the OCEBM system works well for all four types of questions. There was disagreement about the extent to which the other systems work well for questions other than effectiveness. It was noted that some systems are not intended to address other types of questions and it is not clear that it is important that a system should address all four types of questions that we considered (effectiveness, harm, diagnosis, prognosis), although criteria for assessing individual studies must take this into account [31,32].

Most of us did not find that any of the systems are likely to be suitable for use by patients. Almost all agreed that the ACCP system was suitable for professionals and most considered that the USPSTF system was suitable for professionals. There was not much agreement about the suitability of any of the other systems for professionals or about the suitability of any of the systems for policy makers, although most assessed the USTFCPS system to be suitable for policy makers.

There was no agreement that any of the systems are clear and simple, although USPSTF, ACCP and SIGN systems were generally assessed more favourably in this regard. It was generally agreed that the clearer a system was the less simple it was; e.g. the OCEBM system is clear but not simple for categorising the level of evidence. There was some confusion regarding whether we were assessing how clear and simple the system was to guideline developers (as some interpreted this criterion) or how clear and simple the outcome of applying the system was to guideline users (as others interpreted this criterion). Either way, the simpler a system is the less clear it is likely to be.

Most of us judged that for most of the systems necessary information would not be available at least sometimes. The OCEBM system came out somewhat better than the other systems and lack of availability of necessary information was considered to be less of a problem for the USTFCPS system. However, the OCEBM and USTFCPS systems were considered by most to be missing dimensions which may, in part, explain why missing information was considered to be less of a problem. This would be the case to the extent the missing dimensions were the ones for which information would often or sometimes not be available. The dimension for which we considered that information would most often be missing was trade-offs; i.e. knowledge of the preferences or utility values of those affected. Additional problems were identified in relationship to complex interventions and counselling, particularly with the USTFCPS and USPSTF systems. It was pointed out that the USTFCPS system addressed this problem by including availability of information about the intervention as part of its assessment of the quality of evidence.

Most of the systems were assessed to require subjective decisions at least to some extent. The OCEBM system again stood out as being assessed more favourably, although it may be related to omission of dimensions that require more subjective decisions. Judgement is clearly needed with any system. The aim should be to make judgements transparent and to try to protect against bias in the judgements that are made by being systematic and explicit.

Inclusion of dimensions that are not within the constructs being graded was not considered a problem for most of the systems by most of us. Several people considered that it might be a problem for the USTFCPS and USPSTF systems. On the other hand, all of the systems were evaluated to be missing at least one important dimension by at least one person. The challenge of missing dimensions were considered less of a problem for the ACCP and ANHMRC systems. There was not agreement about any of the systems having a clear and simple approach to aggregating the dimensions, although this was considered to be less of a problem for the ACCP, SIGN and USTFCPS systems.

There was also not agreement on the appropriateness of how the dimensions were aggregated. This was considered to be more of a problem for the ANHMRC and USTFCPS systems than the other four systems, all of which were considered to have taken an approach to aggregating the dimensions that was at least partially inappropriate by more than half of us.

Most of us considered that most of the systems had sufficient categories, with the exception of the ANHMRC system. There was almost agreement that the USPSTF system has sufficient categories. We agreed that it is possible to have too many categories as well as too few, the OCEBM system being an example of having too many categories.

There was not agreement that any of the systems are likely to discriminate successfully, although everyone thought that the ACCP, SIGN and USPSTF systems are somewhat to highly likely to discriminate. Lastly, we largely agreed that we were not sure how reproducible assessments are using any of the systems, although half of us considered that assessments using the ANHMRC system are unlikely to be reproducible and about 1/3 considered that assessments using the OCEBM and ACCP systems are likely to be reproducible.

We identified 22 additional organisations that have produced 10 or more practice guidelines using an explicit approach to grade the level of evidence or strength of recommendations. Another 29 have produced between two and nine guidelines using an explicit approach. These systems include a number of minor variations of the six systems that we appraised in detail.

There was generally poor agreement between the individual assessors about the scoring of the six approaches using the 12 criteria. However, there was general agreement that none of these six prominent approaches to grading the levels of evidence and strength of recommendations adequately addressed all of the important concepts and dimensions that we thought should be considered. Although we limited our appraisal to six systems all of the additional approaches to grading levels of evidence and strength of recommendations that we identified were, in essence, variations of the six approaches that we had critically appraised. Therefore we are confident that we did not miss any important grading systems available at the time when these assessments were undertaken.

Based on discussions following the critical appraisal of these six approaches, we agreed on some conclusions:

• Separate assessments should be presented for judgements about the quality of the evidence and judgements about the balance of benefits and harms.

• Evidence for harms should be assessed in the same way as evidence for benefits, although different evidence may be considered relevant for harms than for benefits; e.g. local evidence of complication rates may be considered more relevant than evidence of complication rates from trials for endarterectomy.

• Judgements about the quality of evidence should be based on a systematic review of the relevant research.

• Systematic reviews should not be included in a hierarchy of evidence (i.e. as a level or category of evidence). The availability of a well-done systematic review does not correspond to high quality evidence, since a well-done review might include anything from no studies to poor quality studies with inconsistent results to high quality studies with consistent results.

• Baseline risk should be taken into consideration in defining the population to whom a recommendation applies. Baseline risk should also be used transparently in making judgements about the balance of benefits and harms. When a recommendation varies in relationship to baseline risk, the evidence for determining baseline risk should be assessed appropriately and explicitly.

• Recommendations should not vary in relationship to baseline risk if there is not adequate evidence to guide reliable determinations of baseline risk.

Conclusions

Based on discussions of the strengths and limitations of current approaches to grading levels of evidence and the strength of recommendations, we agreed to develop an approach that addresses the major limitations that we identified. The approach that the GRADE Working Group has developed is based on the discussions following the critical appraisal reported here and a pilot study of the GRADE approach [33]. Based on the pilot testing and the discussions following the pilot, the GRADE Working Group has further developed the GRADE system to its present format [34].

The GRADE Working Group has continued to grow as an informal collaboration that meets one or two times per year. The group maintains web pages http://www.gradeworkinggroup.org webcite and a discussion list.

Competing interests

DA has competing interests with the US Preventive Services Task Force (USPSTF), PAB has a competing interest with the US Task Force on Community Preventive Services (USTFCPS), GHG and HS have competing interests with the American College of Chest Physicians (ACCP), DH, SH and DO'C have competing interests with the Australian National Health and Medical Research Council (ANHMRC), BP has competing interests with the Oxford Centre for Evidence-Based Medicine (OCEBM). Most of the other members of the GRADE Working Group have experience with the use of one or more systems of grading evidence and recommendations.

Contributions

DA, PAB, ME, SF, GHG, DH, SH, AL, DO'C, ADO, BP, HS, TTTE, GEV & JWW Jr as members of the GRADE Working Group have contributed to the preparation of this manuscript and the development of the ideas contained herein, participated in the critical assessment, and read and commented on drafts of this article. GHG and ADO have led the process. GEV has had primary responsibility for coordinating the process.

Acknowledgements

We wish to thank Peter A Briss for participating in the critical assessment and for providing constructive comments on the process. The institutions with which members of the Working Group are affiliated have provided intramural support. Opinions expressed in this paper do not necessarily represent those of the institutions with which the authors are affiliated.

References

  1. Canadian Task Force on the Periodic Health Examination: The periodic health examination.

    Can Med Assoc J 1979, 121:1193-254. PubMed Abstract OpenURL

  2. Sackett DL: Rules of evidence and clinical recommendations on the use of antithrombotic agents.

    Chest 1986, 89(suppl 2):2S-3S. OpenURL

  3. Sackett DL: Rules of evidence and clinical recommendations on the use of antithrombotic agents.

    Archives Int Med 1986, 146:464-465. OpenURL

  4. Sackett DL: Rules of evidence and clinical recommendations on the use of antithrombotic agents.

    Chest 1989, 95:2S-4S. PubMed Abstract OpenURL

  5. Cook DJ, Guyatt GH, Laupacis A, Sackett DL: Rules of evidence and clinical recommendations on the use of antithrombotic agents. Antithrombotic Therapy Consensus Conference.

    Chest 1992, 102(suppl 4):305S-311S. OpenURL

  6. US Department of Health and Human Services, Public Health Service, Agency Health Care Policy and Research: Acute Pain Management: Operative or Medical Procedures and Trauma.

    Agency for Health Care Policy and Research Publications, Rockville, MD. (AHCPR Pub 92-0038) 1992. OpenURL

  7. Gyorkos TW, Tannenbaum TN, Abrahamowicz M, Oxman AD, Scott EA, Millson ME, Rasooly I, Frank JW, Riben PD, Mathias RG: An approach to the development of practice guidelines for community health interventions.

    Can J Public Health 1994, 85(suppl 1):S8-S13. PubMed Abstract OpenURL

  8. Hadorn DC, Baker D: Development of the AHCPR-sponsored heart failure guideline: methodologic and procedural issues.

    J Quality Improvement 1994, 20:539-54. OpenURL

  9. Cook DJ, Guyatt GH, Laupacis A, Sackett DL, Goldberg RJ: Clinical recommendations using levels of evidence for antithrombotic agents.

    Chest 1995, 108(4 Suppl):227S-230S. PubMed Abstract | Publisher Full Text OpenURL

  10. Guyatt GH, Sackett DL, Sinclair JC, Hayward R, Cook DJ, Cook RJ, for the Evidence-Based Medicine Working Group: User's guides to the medical literature.1X. A method for grading health care recommendations. Evidence-Based medicine working group.

    JAMA 1995, 274:1800-4. PubMed Abstract | Publisher Full Text OpenURL

  11. Petrie J, Barnwell E, Grimshaw J: Criteria for appraisal for national use. Pilot Edition. [http://www.sign.ac.uk/methodology/index.html] webcite

    Scottish Intercollegiate Guidelines Network 1995. OpenURL

  12. US Preventive Services Task Force: Guide to Clinical Preventive Services. 2nd edition. Baltimore: Williams & Wilkins; 1996:xxxix-lv.

  13. Eccles M, Clapp Z, Grimshaw J, Adams PC, Higgins B, Purves I, Russell I: North of England evidence based guidelines development project: methods of guideline development.

    BMJ 1996, 312:760-2. PubMed Abstract | Publisher Full Text OpenURL

  14. [http://www.ceveas.it/ceveas/viewpage.do?idp=3] webcite

    Centro per la Valutazione della Efficacia della Assistenza Sanitaria (CeVEAS). Linee Guida per il trattamento del tumore della mammella nella provincia di Modena (Luglio 2000).

    accessed December 29, 2002

    OpenURL

  15. Guyatt GH, Cook DJ, Sackett DL, Eckman M, Pauker S: Grades of recommendation for antithrombotic agents. [http://www.chestjournal.org/content/vol119/1_suppl/] webcite

    Chest 1998, 114(5 Suppl):441S-4S. PubMed Abstract | Publisher Full Text OpenURL

  16. Ball C, Sackett D, Phillips B, Straus S, Haynes B: Levels of evidence and grades of recommendations. Last revised 17 September 1998. [http://www.cebm.net/levels_of_evidence.asp] webcite

    Centre for Evidence-Based Medicine;

  17. National Health and Medical Research Council: How to use the evidence: assessment and application of scientific evidence. [http://www.nhmrc.gov.au/publications/synopses/cp65syn.htm] webcite

    Commonwealth of Australia 2000. OpenURL

  18. Harbour R, Miller J: A new system for grading recommendations in evidence based guidelines.

    BMJ 2001, 323:334-6. PubMed Abstract | Publisher Full Text OpenURL

  19. Roman SH, Silberzweig SB, Siu AL: Grading the evidence for diabetes performance measures [see comments].

    Eff Clin Pract 2000, 3:85-91. PubMed Abstract | Publisher Full Text OpenURL

  20. Woloshin S: Arguing about grades.

    Eff Clin Pract 2000, 3:94-5. PubMed Abstract | Publisher Full Text OpenURL

  21. Guyatt GH, Schünemann H, Cook D, Pauker S, Sinclair J, Bucher H, Jaeschke R: Grades of recommendation for antithrombotic agents.

    Chest 2001, 119:3S-7S. PubMed Abstract | Publisher Full Text OpenURL

  22. Atkins D, Best D, Shapiro EN: The third U.S. Preventive Services Task Force : background, methods and first recommendations.

    Am J Preventive Medicine 2001, 20(3 (supplement 1)):1-108. OpenURL

  23. Woolf SH, Atkins D: The evolving role of prevention in health care: Contributions of the U.S. Preventive Services Task Force.

    Am J Preventive Medicine 2001, 20(3 (supplement 1)):13-20. Publisher Full Text OpenURL

  24. Harris RP, Helfand M, Woolf SH, Lohr KN, Mulrow CD, Teutsch SM, Atkins D, for the Methods Work Group of the Third U.S. Preventive Services Task Force: Current methods of the U.S. Preventive Services Task Force: A review of the process.

    Am J Preventive Medicine 2001, 20(3 (Supplement 1)):21-35. Publisher Full Text OpenURL

  25. Briss PA, Zaza S, Pappaioanou M, Fielding J, Wright-De Aguero L, Truman BI, Hopkins DP, Mullen PD, Thompson RS, Woolf SH, Carande-Kulis VG, Anderson L, Hinman AR, McQueen DV, Teutsch SM, Harris JR: Developing an evidence-based Guide to Community Preventive Services – methods. The Task Force on Community Preventive Services.

    Am J Preventive Medicine 2000, 18:35-43. Publisher Full Text OpenURL

  26. Zaza S, Wright-De Aguero LK, Briss PA, Truman BI, Hopkins DP, Hennessy MH, Sosin DM, Anderson L, Carande-Kulis VG, Teutsch SM, Pappaioanou M: Data collection instrument and procedure for systematic reviews in the Guide to Community Preventive Services. Task Force on Community Preventive Services.

    American Journal of Preventive Medicine 2000, 18:44-74. PubMed Abstract | Publisher Full Text OpenURL

  27. Greer N, Mosser G, Logan G, Halaas GW: A practical approach to evidence grading.

    Joint Commission J Qual Improv 2000, 26:700-12. OpenURL

  28. West S, King V, Carey TS, Lohr KN, McKoy N, Sutton SF, Lux L: Systems to Rate the Strength of Scientific Evidence. Evidence Report/Technology Assessment No. 47 (Prepared by the Research Triangle Institute-University of North Carolina Evidence-based Practice Center under Contract No. 290-97-0011). In AHRQ Publication No. 02-E016. Rockville, MD: Agency for Healthcare Research and Quality; 2002:64-88. OpenURL

  29. Feinstein AR: Clinimetrics. New Haven, CT: Yale University Press; 1987:141-66. OpenURL

  30. National Guidelines Clearing House [http://www.guideline.gov/resources/guideline_index.aspx] webcite

    Accessed April 19, 2001

  31. Guyatt G, Drummond R, eds: Users' Guide to the Medical Literature. Chicago, IL: AMA Press; 2002:55-154.

  32. West S, King V, Carey TS, Lohr KN, McKoy N, Sutton SF, Lux L: Systems to Rate the Strength of Scientific Evidence. Evidence Report/Technology Assessment No. 47 (Prepared by the Research Triangle Institute-University of North Carolina Evidence-based Practice Center under Contract No. 290-97-0011). In AHRQ Publication No. 02-E016. Rockville, MD: Agency for Healthcare Research and Quality; 2002:51-63. OpenURL

  33. Atkins D, Briss PA, Eccles M, Flottorp S, Guyatt GH, Harbour RT, Hill S, Jaeschke R, Liberati A, Magrini N, Mason J, O'Connell D, Oxman AD, Phillips B, Schunemann HJ, Edejer TT, Vist GE, Williams JW Jr, GRADE Working Group: Systems for grading the quality of evidence and the strength of recommendations II: Pilot study of a new system.

    BioMed Central, in press. OpenURL

  34. Atkins D, Best D, Briss PA, Eccles M, Falck Ytter Y, Flottorp S, Guyatt GH, Harbour RT, Haugh MC, Henry D, Hill S, Jaeschke R, Leng G, Liberati A, Magrini N, Mason J, Middleton P, Mrukowicz J, O'Connell D, Oxman AD, Phillips B, Schunemann HJ, Edejer TT, Varonen H, Vist GE, Williams JW Jr, Zaza S, Grade Working Group: Grading quality of evidence and strength of recommendations.

    BMJ 328(7454):1490.

    2004 Jun 19

    OpenURL

Pre-publication history

The pre-publication history for this paper can be accessed here:

http://www.biomedcentral.com/1472-6963/4/38/prepub