Open Access Highly Accessed Research article

Observer ratings of neighborhoods: comparison of two methods

Elena M Andresen1*, Theodore K Malmstrom2, Mario Schootman3, Fredric D Wolinsky4, J Philip Miller5 and Douglas K Miller6

Author Affiliations

1 Institute on Development & Disability, Oregon Health & Science University, Portland, OR, USA

2 Department of Neurology & Psychiatry, School of Medicine, Saint Louis University, 1438 S. Grand, St. Louis, MO 63104, USA

3 Departments of Medicine and Pediatrics, Washington University School of Medicine, 4444 Forest Park Parkway, Box 8504, St. Louis, MO 63108, USA

4 Departments of Health Management and Policy, Internal Medicine, and Adult Nursing, the University of Iowa, N211 CPHB, 105 River St., Iowa City, IA 52242, USA

5 Division of Biostatistics, Washington University School of Medicine, 660 South Euclid Avenue, Campus Box 8067, St. Louis, MO 63110, USA

6 Regenstrief Institute, Inc., and Center for Aging Research, Indiana University School of Medicine, 410 West 10th Street, Suite 2000, Indianapolis, IN 46202-3012, USA

For all author emails, please log on.

BMC Public Health 2013, 13:1024  doi:10.1186/1471-2458-13-1024

Published: 29 October 2013



Although neighborhood characteristics have important relationships with health outcomes, direct observation involves imperfect measurement. The African American Health (AAH) study included two observer neighborhood rating systems (5-item Krause and 18-item AAH Neighborhood Assessment Scale [NAS]), initially fielded at two different waves. Good measurement characteristics were previously shown for both, but there was more rater variability than desired. In 2010 both measures were re-fielded together, with enhanced training and field methods implemented to decrease rater variability while maintaining psychometric properties.


AAH included a poor inner city and more heterogeneous suburban areas. Four interviewers rated 483 blocks, with 120 randomly-selected blocks rated by two interviewers. We conducted confirmatory factor analysis of scales and tested the Krause (5-20 points), AAH 18-item NAS (0-28 points), and a previous 7-item and new 5-item versions of the NAS (0-17 points, 0-11 points). Retest reliability for items (kappa) and scales (Intraclass Correlation Coefficient [ICC]) were calculated overall and among pre-specified subgroups. Linear regression assessed interviewer effects on total scale scores and assessed concurrent validity on lung and lower body functions. Mismeasurement effects on self-rated health were also assessed.


Scale scores were better in the suburbs than in the inner city. ICC was poor for the Krause scale (ICC=0.19), but improved if the retests occurred within 10 days (ICC=0.49). The 7- and 5-item NAS scales had better ICCs (0.56 and 0.62, respectively), and were higher (0.71 and 0.73) within 10 days. Rater variability for the Kraus and 5- and 7-item NAS scales was 1-3 points (compared to the supervising rater). Concurrent validity was modest, with residents living in worse neighborhood conditions having worse function. Unadjusted estimates were biased towards the null compared with measurement-error corrected estimates.


Enhanced field protocols and rater training did not improve measurement quality. Specifically, retest reliability and interviewer variability remained problematic. Measurement error partially reduced, but did not eliminate concurrent validity, suggesting there are robust associations between neighborhood characteristics and health outcomes. We conclude that the 5-item AAH NAS has sufficient reliability and validity for further use. Additional research on the measurement properties of environmental rating methods is encouraged.