Email updates

Keep up to date with the latest news and content from BMC Veterinary Research and BioMed Central.

Open Access Research article

Evaluating observer agreement of scoring systems for foot integrity and footrot lesions in sheep

Alessandro Foddai1, Laura E Green2, Sam A Mason2 and Jasmeet Kaler3*

Author Affiliations

1 Quantitative Veterinary Epidemiology group, Wageningen Institute of Animal Sciences, Wageningen University, Wageningen, The Netherlands

2 School of Life Sciences, University of Warwick, Coventry, England, CV4 7AL, UK

3 The School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington Campus, Sutton Bonington, Loughborough, Leicestershire England, LE12 5RD, UK

For all author emails, please log on.

BMC Veterinary Research 2012, 8:65  doi:10.1186/1746-6148-8-65

Published: 25 May 2012



A scoring scale with five ordinal categories is used for visual diagnosis of footrot in sheep and to study its epidemiology and control. More recently a 4 point ordinal scale has been used by researchers to score foot integrity (wall and sole horn damage) in sheep. There is no information on observer agreement using either of these scales. Observer agreement for ordinal scores is usually estimated by single measure values such as weighted kappa or Kendall’s coefficient of concordance which provide no information where the disagreement lies. Modeling techniques such as latent class models provide information on both observer bias and whether observers have different thresholds at which they change the score given. In this paper we use weighted kappa and located latent class modeling to explore observer agreement when scoring footrot lesions (using photographs and videos) and foot integrity (using post mortem specimens) in sheep. We used 3 observers and 80 photographs and videos and 80 feet respectively.


Both footrot and foot integrity scoring scales were more consistent within observers than between. The weighted kappa values between observers for both footrot and integrity scoring scales ranged from moderate to substantial. There was disagreement between observers with both observer bias and different thresholds between score values. The between observer thresholds were different for scores 1 and 2 for footrot (using photographs and videos) and for all scores for integrity (both walls and soles). The within observer agreement was higher with weighted kappa values ranging from substantial to almost perfect. Within observer thresholds were also more consistent than between observer thresholds. Scoring using photographs was less variable than scoring using video clips or feet.


Latent class modeling is a useful method for exploring components of disagreement within and between observers and this information could be used when developing a scoring system to improve reliability.