Email updates

Keep up to date with the latest news and content from BMC Medical Research Methodology and BioMed Central.

Open Access Research article

Fitting parametric random effects models in very large data sets with application to VHA national data

Mulugeta Gebregziabher12*, Leonard Egede13, Gregory E Gilbert1, Kelly Hunt12, Paul J Nietert2 and Patrick Mauldin14

Author Affiliations

1 Center for Disease Prevention and Health Interventions for Diverse Populations, Ralph H. Johnson Veterans Affairs Medical Center, Charleston, SC, USA

2 Division of Biostatistics & Epidemiology, Medical University of South Carolina, 135 Cannon St, Charleston, SC, 29425, USA

3 Center for Health Disparities Research, Division of General Internal Medicine, Medical University of South Carolina, Charleston, SC, USA

4 Department of Clinical Pharmacy and Outcome Sciences, South Carolina College of Pharmacy, Charleston, SC, USA

For all author emails, please log on.

BMC Medical Research Methodology 2012, 12:163  doi:10.1186/1471-2288-12-163

Published: 24 October 2012

Abstract

Background

With the current focus on personalized medicine, patient/subject level inference is often of key interest in translational research. As a result, random effects models (REM) are becoming popular for patient level inference. However, for very large data sets that are characterized by large sample size, it can be difficult to fit REM using commonly available statistical software such as SAS since they require inordinate amounts of computer time and memory allocations beyond what are available preventing model convergence. For example, in a retrospective cohort study of over 800,000 Veterans with type 2 diabetes with longitudinal data over 5 years, fitting REM via generalized linear mixed modeling using currently available standard procedures in SAS (e.g. PROC GLIMMIX) was very difficult and same problems exist in Stata’s gllamm or R’s lme packages. Thus, this study proposes and assesses the performance of a meta regression approach and makes comparison with methods based on sampling of the full data.

Data

We use both simulated and real data from a national cohort of Veterans with type 2 diabetes (n=890,394) which was created by linking multiple patient and administrative files resulting in a cohort with longitudinal data collected over 5 years.

Methods and results

The outcome of interest was mean annual HbA1c measured over a 5 years period. Using this outcome, we compared parameter estimates from the proposed random effects meta regression (REMR) with estimates based on simple random sampling and VISN (Veterans Integrated Service Networks) based stratified sampling of the full data. Our results indicate that REMR provides parameter estimates that are less likely to be biased with tighter confidence intervals when the VISN level estimates are homogenous.

Conclusion

When the interest is to fit REM in repeated measures data with very large sample size, REMR can be used as a good alternative. It leads to reasonable inference for both Gaussian and non-Gaussian responses if parameter estimates are homogeneous across VISNs.

Keywords:
Generalized linear mixed model; Homogeneity; Random effect meta regression; Longitudinal data; Very large dataset