Open Access Correspondence

Development of a diagnostic test set to assess agreement in breast pathology: practical application of the Guidelines for Reporting Reliability and Agreement Studies (GRRAS)

Natalia V Oster1*, Patricia A Carney2, Kimberly H Allison3, Donald L Weaver4, Lisa M Reisch1, Gary Longton5, Tracy Onega6, Margaret Pepe5, Berta M Geller7, Heidi D Nelson8, Tyler R Ross1, N AAnna Tosteson6 and Joann G Elmore1

Author Affiliations

1 Department of Medicine, University of Washington, Seattle, WA, USA

2 Department of Family Medicine, Oregon Health and Science University, Portland, OR, USA

3 Department of Pathology, Stanford University School of Medicine, Palo Alto, CA, USA

4 Department of Pathology, University of Vermont and Vermont Cancer Center, Burlington, VT, USA

5 Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA

6 Norris Cotton Cancer Center and The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth, Hanover, NH, USA

7 Office of Health Promotion Research, University of Vermont, Burlington, VT, USA

8 Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, OR, USA

For all author emails, please log on.

BMC Women's Health 2013, 13:3  doi:10.1186/1472-6874-13-3

Published: 5 February 2013



Diagnostic test sets are a valuable research tool that contributes importantly to the validity and reliability of studies that assess agreement in breast pathology. In order to fully understand the strengths and weaknesses of any agreement and reliability study, however, the methods should be fully reported. In this paper we provide a step-by-step description of the methods used to create four complex test sets for a study of diagnostic agreement among pathologists interpreting breast biopsy specimens. We use the newly developed Guidelines for Reporting Reliability and Agreement Studies (GRRAS) as a basis to report these methods.


Breast tissue biopsies were selected from the National Cancer Institute-funded Breast Cancer Surveillance Consortium sites. We used a random sampling stratified according to woman’s age (40–49 vs. ≥50), parenchymal breast density (low vs. high) and interpretation of the original pathologist. A 3-member panel of expert breast pathologists first independently interpreted each case using five primary diagnostic categories (non-proliferative changes, proliferative changes without atypia, atypical ductal hyperplasia, ductal carcinoma in situ, and invasive carcinoma). When the experts did not unanimously agree on a case diagnosis a modified Delphi method was used to determine the reference standard consensus diagnosis. The final test cases were stratified and randomly assigned into one of four unique test sets.


We found GRRAS recommendations to be very useful in reporting diagnostic test set development and recommend inclusion of two additional criteria: 1) characterizing the study population and 2) describing the methods for reference diagnosis, when applicable.

Reporting guidelines; Reliability of results; Agreement studies; Breast; Pathology; Diagnostic techniques