Open Access Research article

Evaluation of mammographic density patterns: reproducibility and concordance among scales

Macarena Garrido-Estepa1, Francisco Ruiz-Perales2, Josefa Miranda23, Nieves Ascunce45, Isabel González-Román6, Carmen Sánchez-Contador7, Carmen Santamariña8, Pilar Moreo9, Carmen Vidal10, Mercé Peris10, María P Moreno9, Jose A Váquez-Carrete8, Francisca Collado-García7, Francisco Casanova6, María Ederra45, Dolores Salas23, Marina Pollán15* and DDM-Spain1

Author Affiliations

1 National Centre for Epidemiology, Instituto de Salud Carlos III, Madrid, Spain

2 Valencia Breast Cancer Screening Programme, General Directorate Public Health, Valencia, Spain

3 Centro Superior de Investigación en Salud Pública(CSISP), Valencia, Spain

4 Navarra Breast Cancer Screening Programme, Public Health Institute, Pamplona, Spain

5 Consortium for Biomedical Research in Epidemiology & Public Health (CIBER en Epidemiología y Salud Pública - CIBERESP), Spain

6 Castilla-Leon Breast Cancer Screening Programme, D.G. Salud Pública ID e I, SACYL, Castilla y León, Spain

7 Balearic Islands Breast Cancer Screening Programme, Health Promotion for Women and Childhood, General Directorate Public Health and Participation, Regional Authority of Health and Consumer Affairs, Balearic Islands, Spain

8 Galicia Breast Cancer Screening Programme, Regional Authority of Health, Galicia Regional Government, Spain

9 Aragon Breast Cancer Screening Programme, Health Service of Aragon, Zaragoza, Spain

10 Cancer Prevention and Control Unit, Catalan Institute of Oncology (ICO), Barcelona, Spain

For all author emails, please log on.

BMC Cancer 2010, 10:485  doi:10.1186/1471-2407-10-485

Published: 13 September 2010

Abstract

Background

Increased mammographic breast density is a moderate risk factor for breast cancer. Different scales have been proposed for classifying mammographic density. This study sought to assess intra-rater agreement for the most widely used scales (Wolfe, Tabár, BI-RADS and Boyd) and compare them in terms of classifying mammograms as high- or low-density.

Methods

The study covered 3572 mammograms drawn from women included in the DDM-Spain study, carried-out in seven Spanish Autonomous Regions. Each mammogram was read by an expert radiologist and classified using the Wolfe, Tabár, BI-RADS and Boyd scales. In addition, 375 mammograms randomly selected were read a second time to estimate intra-rater agreement for each scale using the kappa statistic. Owing to the ordinal nature of the scales, weighted kappa was computed. The entire set of mammograms (3572) was used to calculate agreement among the different scales in classifying high/low-density patterns, with the kappa statistic being computed on a pair-wise basis. High density was defined as follows: percentage of dense tissue greater than 50% for the Boyd, "heterogeneously dense and extremely dense" categories for the BI-RADS, categories P2 and DY for the Wolfe, and categories IV and V for the Tabár scales.

Results

There was good agreement between the first and second reading, with weighted kappa values of 0.84 for Wolfe, 0.71 for Tabár, 0.90 for BI-RADS, and 0.92 for Boyd scale. Furthermore, there was substantial agreement among the different scales in classifying high- versus low-density patterns. Agreement was almost perfect between the quantitative scales, Boyd and BI-RADS, and good for those based on the observed pattern, i.e., Tabár and Wolfe (kappa 0.81). Agreement was lower when comparing a pattern-based (Wolfe or Tabár) versus a quantitative-based (BI-RADS or Boyd) scale. Moreover, the Wolfe and Tabár scales classified more mammograms in the high-risk group, 46.61 and 37.32% respectively, while this percentage was lower for the quantitative scales (21.89% for BI-RADS and 21.86% for Boyd).

Conclusions

Visual scales of mammographic density show a high reproducibility when appropriate training is provided. Their ability to distinguish between high and low risk render them useful for routine use by breast cancer screening programs. Quantitative-based scales are more specific than pattern-based scales in classifying populations in the high-risk group.