Open Access Highly Accessed Research article

Effects of aggregation of drug and diagnostic codes on the performance of the high-dimensional propensity score algorithm: an empirical example

Hoa V Le12*, Charles Poole1, M Alan Brookhart1, Victor J Schoenbach1, Kathleen J Beach2, J Bradley Layton1 and Til Stürmer1

Author Affiliations

1 Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, USA

2 GlaxoSmithKline, Research Triangle Park, Durham, North Carolina, USA

For all author emails, please log on.

BMC Medical Research Methodology 2013, 13:142  doi:10.1186/1471-2288-13-142

Published: 19 November 2013

Abstract

Background

The High-Dimensional Propensity Score (hd-PS) algorithm can select and adjust for baseline confounders of treatment-outcome associations in pharmacoepidemiologic studies that use healthcare claims data. How hd-PS performance is affected by aggregating medications or medical diagnoses has not been assessed.

Methods

We evaluated the effects of aggregating medications or diagnoses on hd-PS performance in an empirical example using resampled cohorts with small sample size, rare outcome incidence, or low exposure prevalence. In a cohort study comparing the risk of upper gastrointestinal complications in celecoxib or traditional NSAIDs (diclofenac, ibuprofen) initiators with rheumatoid arthritis and osteoarthritis, we (1) aggregated medications and International Classification of Diseases-9 (ICD-9) diagnoses into hierarchies of the Anatomical Therapeutic Chemical classification (ATC) and the Clinical Classification Software (CCS), respectively, and (2) sampled the full cohort using techniques validated by simulations to create 9,600 samples to compare 16 aggregation scenarios across 50% and 20% samples with varying outcome incidence and exposure prevalence. We applied hd-PS to estimate relative risks (RR) using 5 dimensions, predefined confounders, ≤ 500 hd-PS covariates, and propensity score deciles. For each scenario, we calculated: (1) the geometric mean RR; (2) the difference between the scenario mean ln(RR) and the ln(RR) from published randomized controlled trials (RCT); and (3) the proportional difference in the degree of estimated confounding between that scenario and the base scenario (no aggregation).

Results

Compared with the base scenario, aggregations of medications into ATC level 4 alone or in combination with aggregation of diagnoses into CCS level 1 improved the hd-PS confounding adjustment in most scenarios, reducing residual confounding compared with the RCT findings by up to 19%.

Conclusions

Aggregation of codes using hierarchical coding systems may improve the performance of the hd-PS to control for confounders. The balance of advantages and disadvantages of aggregation is likely to vary across research settings.

Keywords:
Aggregation; Anatomical therapeutic chemical classification; Clinical classification software; Confounding by indication; Infrequent exposure; Propensity score; Small sample; Rare outcome