Department of Biostatistics, School of Medicine, University of Kansas Medical Center, Kansas City, KS, USA

School of Nursing, University of Kansas Medical Center, Kansas City, KS, USA

Abstract

Background

The National Database for Nursing Quality Indicators^{® }(NDNQI^{®}) was established in 1998 to assist hospitals in monitoring indicators of nursing quality (eg, falls and pressure ulcers). Hospitals participating in NDNQI transmit data from nursing units to an NDNQI data repository. Data are summarized and published in reports that allow participating facilities to compare the results for their units with those from other units across the nation. A disadvantage of this reporting scheme is that the sampling variability is not explicit. For example, suppose a small nursing unit that has 2 out of 10 (rate of 20%) patients with pressure ulcers. Should the nursing unit immediately undertake a quality improvement plan because of the rate difference from the national average (7%)?

Methods

In this paper, we propose approximating 95% credible intervals (CrIs) for unit-level data using statistical models that account for the variability in unit rates for report cards.

Results

Bayesian CrIs communicate the level of uncertainty of estimates more clearly to decision makers than other significance tests.

Conclusion

A benefit of this approach is that nursing units would be better able to distinguish problematic or beneficial trends from fluctuations likely due to chance.

Background

In 1998 the American Nurses Association (ANA) established the National Database of Nursing Quality Indicators (NDNQI) in order to provide hospitals national comparative data (report cards) that measure

The current process does not provide a measure of uncertainty at the unit level and therefore does not support optimal decision making. Historically, NDNQI provides significance information by noting whether a confidence interval covers the unit's value for an indicator. The original reports bold a nursing unit is if it is significantly above or below the overall mean. Technically it is the mean of the same type of units in similar type of hospitals. This confidence interval reflects the average

Recently, NDNQI held a user focus group to study how hospital staff uses the NDNQI report cards. Paraphrasing one user, "whenever our unit is significantly above the overall mean, we immediately require the quality nurse to explain this deficiency in writing." Further, NDNQI'S technical assistance staff has reported that hospitals' staff voiced a strong concern and need to react when the bolded indicators when their unit showed a statistically significant problem (personal communication, Susan Klaus, 6/25/08). This feedback indicates a possible overreaction to statistically significant difference from the mean.

There are challenges in generating intervals at the unit level. Specifically, unit data based on a small number of patients would be at risk of extreme period-to-period variation due to the occurrence of a rare event. Such variability would not reflect the overall level of the true quality of care provided on the unit. Further, measuring uncertainty can be very difficult if no events are observed.

The literature on model-based report cards indicates that Bayesian hierarchical models are optimal for addressing interpretation problems resulting from small numbers of observations. Indeed, Bayesian hierarchical models have been treated as the "gold standard" because they provide a sound basis for smoothing random variation and for estimating uncertainty in estimates – both of which could reduce over-reaction and possibly costly errant decision making (e.g.

NDNQI report cards are generated quarterly, incorporating approximately 163 measures. Before actually generating reports, data quality is investigated using statistical methods in order to detect outliers, missing data, and illogical data patterns. Nursing units with potential errors in their data are flagged and NDNQI staff calls the hospitals to correct errors. The data are processed using over 2,000 lines of SAS code, which takes 3–4 hours to run. Reports must be issued within 30–45 days of the close of data entry for a quarter.

The literature is filled with arguments advocating for Bayesian hierarchical models. Some recent examples for report cards follow. Reference

Our primary goal was to develop a procedure approximating the fully Bayesian hierarchical method that is easy to implement and is transparent to the hospital report users. A fully Bayesian approach via MCMC is not feasible with NDNQI's deadline for report delivery given the iterative and monitoring requirements of MCMC and the fact that there are 163 indicators and over 10,000 units.

We propose a method for approximating the fully Bayesian approach using modeling frequently referred to as the empirical Bayes approach

Methods

2.1 Data source

For comparison purposes, the benchmarks used in report cards are stratified by unit type and bed size (or some other hospital characteristic). The unit types include: critical care, step down, medical, surgical, combined medical-surgical, and rehabilitation. The hospitals are stratified into five bed size categories or three teaching status categories. These variables provide the comparison groups for each of the nursing units; for example, all combined medical-surgical units from a small teaching hospital are compared to one another. We do not include the covariates in a large hierarchical model here but create separate models for each subgroup. As previously mentioned, of the 163 measures, we focused on the following three NDNQI indicators: fall rates, pressure ulcer (PrU) rates, and registered nurse job enjoyment (JE). We use data collected in a recent quarter for the combined medical-surgical units. We do not disclose the bed size here (for this paper's example) because specific benchmarks are proprietary data. While we illustrate the methodology on this specific set of indicators, the method can be replicated to other indicators that follow different distributions.

The fall rates are the number of falls per thousand patient days. PrU rates are the number of patients in a 24 hour period that have at least one pressure ulcer as a proportion all patients assessed. Job enjoyment data are from a survey of registered nurses (RN). This indicator is the average of seven questions on a six-point Likert scale, ranging from (1) strongly disagree to (6) strongly agree. Example questions include: nurses are satisfied with jobs and find real enjoyment in their job. We will further define these indicators in Section 2.2. The summary statistics for the three indicators are listed in Table ^{2}) and the number of units (

Summary statistics (across medical-surgical units) for each indicator.

Indicator

^{2}

Fall Rates

4.31

4.61

163

Pressure Ulcer (PrU) Rates

0.0553

0.0038

171

Job Enjoyment (JE)

3.49

0.4528

97

We note that many report card systems advocate risk adjustment. Advocates of risk adjustment believe that it allows for fair comparisons across units that may have different patient populations. Alternatives to risk adjustment include defining

2.2 Approach

In this section we define the general model for each of the indicators and present fully, approximate, and non-informative Bayesian approaches. We discuss model adequacy and model comparison of the gold standard (fully Bayesian approach) with the approximate approach. We make two points: (1) the primary goal is to provide an interval representing variation within the unit rather than across units; and (2) one can only do this (in general) by borrowing information across units.

2.2.1. General model

Let _{j }be an indicator for the _{j }denote the parameter that determines the sampling distribution, or _{j}|_{j}~_{j}|_{j}). The Bayesian hierarchical model (BHM) assumes that _{j }is random with a distribution _{j}|_{0}), where _{0 }is a vector of hyper-parameters that need to be estimated. The posterior distribution is defined by applying Bayes theorem. Using our notation, the posterior distribution is _{j}|_{j}~_{j}|_{j}) = _{j}|_{j})_{j}|_{0})/_{j}) where _{j}) = ∫_{j}|_{j})_{j}|θ_{0})_{j}. The posterior predictive distribution of the unit (which will be used for goodness of fit), is the sampling distribution integrated across the posterior distribution, specifically ^{p}_{j}|_{j}~∫^{p}_{j}|_{j})_{j}|_{j})_{j}.

Next we discuss this BHM in the context of our three example indicators: fall rates, pressure ulcers, and RN job enjoyment. For each indicator, we discuss the specific sampling distribution, prior distributions of the parameters, and their posterior distributions. Much of the detail that we discuss here can be found in

Poisson (Fall Rates)

For fall rates, we assume that _{j }is the number of falls across the quarter of interest and _{j }represents the number of patient days divided by 1,000. The fall rate indicator is then _{j }= _{j}/_{j }which represents the observed number of falls per 1,000 patient days. We assume that _{j }follows a Poisson distribution and that _{j }is the average fall rate (per 1,000 patient days) for the ^{th }unit. Therefore, _{j}|_{j}~Poisson(_{j}_{j}). A conjugate prior for the Poisson distribution is the gamma distribution, so we assume that _{j}~Γ(^{2}. Supposing, for now, that _{j}|_{j }is Γ(_{j}+_{j}+1/_{j}|_{j }is (_{j}+_{j}+1/_{j}/(_{j}+1)}_{j }+ {1/(_{j}+1)}_{j }and the prior fall rate

Binomial (Pressure Ulcer Rates)

For hospital acquired pressure ulcers (PrU), let _{j }be the number of patients out of _{j }who have a hospital acquired pressure ulcer that is observed during their the 24 hour data collection period. The indicator is then _{j }= _{j}/_{j }which represents the observed pressure ulcer rate. We assume that _{j }is a binomial distribution with _{j }trials and that _{j }is the average pressure ulcer rate for unit _{j}|_{j}~Bin(_{j},_{j}). A conjugate prior for the binomial distribution is _{j}~Beta(^{2}(_{j}|_{j }is Beta(_{j}_{j}-_{j}+_{j}|_{j }is (_{j}_{j}_{j}/(_{j}+_{j}+{(_{j}_{j }and the prior PrU rate

Normal (RN Job Enjoyment)

For RN job enjoyment (JE), let _{j }be the observed average score and _{j }the standard deviation for _{j }RNs in unit _{j }is normally distributed; reasonable in practice despite the fact that _{j }is bounded because this average rarely reaches the ends of the boundary. We assume that _{j }is the average RN job enjoyment for unit _{j}|_{j}~N(_{j},^{2}/_{j}), where we assume ^{2 }= {Σ(_{j}-1)_{j }^{2}}/{Σ(_{j}-1)}. The assumption of homogenous ^{2 }could be relaxed. We assume that _{j}~N(^{2}_{θ}). The posterior distribution is _{j}|_{j}~N(_{j }*,_{j }*) where _{j}* = {_{j}/^{2}}/{_{j}/^{2}+1/^{2}_{θ}}_{j}+{1/^{2}_{θ}}/{_{j}/^{2}+1/^{2}_{θ}}_{j}* = 1/{_{j}/^{2}+1/_{θ}}.

Measures of Uncertainty

Assuming the prior parameters are known, the uncertainty can be summarized by the posterior distribution of _{j }using 2.5% and 97.5%-tile as a 95% credible interval (CrI)

2.2.2. Fully Bayesian approach

Bayesian profiling has been in the literature for over ten years (e.g.

Agency for Healthcare Research and Quality (AHRQ)

2.2.3. Approximate Bayesian approach

Empirical Bayes methods for profiling were developed by _{j}/^{2 }= Σ(_{j}-^{2}/(^{2 }are both summarized in the current report cards. Next, for the purposes of approximating hyper-parameters, for each of the three outcomes above, temporarily set _{j}. = _{j}. Then use the method of moments (MOM) (e.g.

Because the MOM estimates of the prior distribution are specified by summary statistics, the MOM estimates for fall rates are 1/^{2 }and ^{2}/^{2 }telling us that the higher the mean or the lower the variance is, the more "equivalent prior" number of patient days. If the variation (^{2}) is small, then we can assume that other units are providing more information.

The MOM for the prior PrU has a similar property, the solution is ^{2}-1} and ^{2}-1}. Considering that ^{2 }decreases the prior sample size increases.

The interpretation for job enjoyment is straightforward with usual normal theory as ^{2}_{θ }= ^{2}. The lower ^{2 }is the more informative the prior. Further, recall that the posterior variance of the unit is _{j}* = 1/{_{j}/^{2}+1/^{2}_{θ}} = ^{2}/{_{j}+^{2}/^{2}_{θ}} arguing that ^{2}/^{2}_{θ }is the prior sample size.

The approximate Bayesian approach produces results similar to, but slightly more conservative than, the fully Bayesian approach. This is because the variance of the hyper-parameters under the fully Bayesian approach essentially estimate the variance of the smoothed parameters through the shrinkage estimates. The prior parameters under the approximate Bayesian approach do not use any sort of shrinkage, therefore resulting in larger variances for the prior distribution compared to the fully Bayesian approach.

2.2.4. Non-informative Bayesian approach

A third approach uses what is sometimes called non-informative or a flat prior distribution. Essentially, one assumes that there is no other information outside of the summary statistics observed for the particular unit types under question and that there are no prior patients for pressure ulcers, no prior patient days, and that the variance of the prior distribution for JE is infinity. This results in CrIs close to the traditional confidence intervals. A drawback of the approach is that it is very difficult to calculate intervals when information is observed on the edge of the sample space (e.g. 0 falls, 0 PrU, or _{j }PrU).

2.2.5. Model adequacy and relative fit

To test whether the models were correctly specified, we calculate a chi-square goodness of fit measure for each of the indicators using the fully Bayesian approach (Gelman et al, 2000). Specifically, we define χ^{2 }= Σ(_{j}-_{j})^{2}/_{j}) and χ^{2}_{p }= Σ(^{p}_{j}-_{j})^{2}/_{j}) for the posterior predictive data. The discrepancy between model parameters and the observed data is χ^{2 }and for the posterior predictive data is χ^{2}_{p}. The goodness of fit Bayesian p-value is thus Pr(χ^{2 }< χ^{2}_{p}) and values that are between 0.01 and 0.99 are deemed to reflect a reasonable fit. There is no need to incorporate the degrees of freedom in the calculation because the probability is calculated using the posterior distribution from MCMC.

To test how well the approximate Bayes approach emulated the fully Bayesian approach we utilize the Deviance Information Criterion (DIC) (

We look at the number of times we would decide a unit's indicator is "significantly" below (or above) the overall mean across all units. For the purposes of this paper we will decide a unit is significantly below the national mean if Pr(_{j }<_{j},_{0}) > .95 (above if Pr(_{j }> _{j},_{0}) > .95). We defined a quality index for the ^{th }unit to be _{j }= Pr(_{j }> _{j},_{0}) for JE and reverse the inequality to _{j }= Pr(_{j }<_{j},_{0}) for PrU and fall rates. We calculated _{j }for the approximate Bayesian approach and a non-informative approach.

2.2.6. Sensitivity of Sample Size

The limitations of the approximate relative to the fully Bayesian approach was explored by varying the number of nursing units in the analysis. Using randomly selected sample sizes of 5, 10, 25, 50, 75, and 95, we compared the approximate approach to the fully Bayesian approach by taking 10,000 draws from the posterior prediction of a future nursing unit using the approximate parameters from the method of moments and comparing against 10,000 draws from the fully Bayesian approach. This was repeated for fall rates, PrU rates, and JE. The goal was to see at what point we would be "forced" to use a fully Bayesian approach rather than an approximation.

Results

The fully Bayesian approach was implemented using MCMC with the program WinBUGS. The hyper parameters had vague priors: fall rates _{j}~N(^{2}_{θ})^{2}), 1/^{2}_{θ}~Γ(0.001,1000). Alternatively, weakly informative priors (not considered here) might be useful. The prior mean, for example, could be centered at the units' sample mean from the prior quarter. The prior variance could be based on expectations about how far a unit is likely to differ from the average in an extreme case. Using the fully Bayesian approaches, the model across all three indicators were adequate as measured by Bayesian p-values. The model adequacy is summarized by p-values that indicate whether the model is an accurate reflection of the data. If the p-value is high, then we are inclined to believe that the model is adequate. If the p-value is low then we will reject the adequacy of the model and would need to fit an alternative approach. These p-values were p = 0.5189 for fall rates, p = 0.4686 for PrUs, and p = 0.5184 for JE, which indicated that the Poisson, binomial, and normal distributions were adequate models for the sampling distribution. We report model adequacy for the fully Bayesian approach only since the other approaches are its approximations.

Table

Summary of relative fit across models for each indicator.

Indicator

Sampling Distribution

DIC, fully

DIC, approximate

DIC, non-informative

Fall Rates

Poisson

909.2

907.3

965.9

PrU Rates

Binomial

498.1

490.8

572.3

JE

Normal

61.2

61.0

87.4

To further describe the differences between the fully and approximate approaches we plotted the prior distributions for each indicator (Figure ^{2 }= 4.31/4.61 = 0.93 and ^{2}/^{2 }= 4.31^{2}/4.61 = 4.02 telling us the information from other units provides around one-thousand patient days (around 11 patients per 24 hour days) and just over 4 falls. For the prior PrU the solution is ^{2}-1} = 0.70 and ^{2}-1} = 12.04, corresponding to 0.70 as the number of prior pressure ulcers and almost 13 as the prior number of patients. The interpretation for job enjoyment is straightforward with usual normal theory as ^{2}_{θ }= 0.45. The pooled within unit variance is ^{2 }= 0.71; thus the prior sample size estimate is ^{2}/^{2}_{θ }= 0.71/0.45 = 1.58 (a prior of 1.5 RNs). We can demonstrate the relative amount of information borrowed (on average) by taking the ratio of the prior patient days and the average patient days, ratio of prior sample size and average sample size, and the ratio of the prior RNs and the average RNs. This corresponds to 0.93/2.31 = 0.40; 12.74/24.1 = 0.53; and 1.58/16.6 = 0.10, respectively. These results indicate that the information across units for fall rates and pressure ulcers informs individual units more than they do for JE.

Prior distributions for three indicators using the Full and Approximate Bayesian Models

**Prior distributions for three indicators using the Full and Approximate Bayesian Models.**

Figures

Posterior distribution for four units' fall rates

**Posterior distribution for four units' fall rates.**

Posterior distribution for four units' PrU rates

**Posterior distribution for four units' PrU rates.**

Posterior distribution for four units' for JE

**Posterior distribution for four units' for JE.**

On a personal computer with 3.20 GHz and 2.00 GB RAM, the fully Bayesian approach took 27 seconds to sample 11,000 MCMC iterations. Assuming similar number of units across 163 indicators, the method would take 73 minutes; a small time savings. The real savings occurs because the approximate Bayes approach does not require monitoring of convergence of the MCMC and is thus much easier to automate the approximate Bayes approach for report card generation. Additionally, the approximate approach is easier to explain to NDNQI users. A switch to the full Bayes approach would be necessary in the event that the approximate approach inadequately reflects the full approach, which would require continued assessment of this relationship for future indicators.

Overall (Figure

Comparison of methods for assessing significant units

**Comparison of methods for assessing significant units.**

These results are in stark contrast to what one gets using an interval approach across units (indicator is significant if above or below this interval). The 95% confidence interval for the overall mean for falls, PrUs, and JE was 3.97–4.63; 0.046–0.065; and 3.56–3.63 respectively corresponding to 80 below and 63 above for falls; 95 below and 56 above for PrUs; and 50 below and 41 above for JE.

3.1 Example Report Cards

The following displays represent how different reports would look for two different units for fall rates and PrU rates.

Display for Unit X (Fall Rates)

In the last quarter, a unit X had 8 falls and 2,348 patient-days. This resulted in an observed fall rate of 3.41 falls per thousand patient days. A 95% credible interval for the unit was 2.15–5.96. The average across all units of this type was 4.30. The quality index for fall rates was thus: 0.75. The quality index is the probability that a unit's fall rate is below the overall average, with a higher score being better. We consider units with a quality index above 0.95 to be significant. The fall rate on this unit was not significantly below the average fall rate.

Display for Unit Y (Fall Rates)

In the last quarter, a unit Y had 1 fall and had 1,481 patient-days. This resulted in an observed fall rate of 0.68 falls per thousand patient days. A 95% credible interval for the unit was 0.68–4.25. The average across all units of this type was 4.30. The quality index for fall rates was thus: 0.98. The unit's fall rate was significantly below the average.

Display for Unit X (PrU Rates)

In the last quarter, unit X had 3 patients with PrUs out of 24 patients in the census. This resulted in an observed PrU rate of 0.13. A 95% credible interval for unit X was 0.03–0.22. The average across all units of this type was 0.06. The quality index for fall rates was thus: 0.19. The unit was not significantly below the average PrU rate.

Display for Unit Y (PrU Rates)

In the last quarter, unit Y had 0 patients with PrUs out of 17 patients in the census. This resulted in an observed PrU rate of 0.00. A 95% credible interval for the unit was 0.00–0.10. The average across all units of this type was 0.06. The quality index for fall rates was thus: 0.89. The unit was not significantly below the average PrU rate.

3.2 Fully versus Approximate Bayes for Various Sample Sizes

Figure

Q-Q plots of 10,000 simulations comparing "full" Bayesian approach to the "approx" Bayesian approach for sample sizes varying from 5 to 95

**Q-Q plots of 10,000 simulations comparing "full" Bayesian approach to the "approx" Bayesian approach for sample sizes varying from 5 to 95.**

Discussion

The intent of this study was to explore a practical approximation for implementing a Bayesian approach for nursing outcome report cards. This method supplies report card users with more information than was given in the past reports; specifically, the probability of being below the overall mean and the 95% CrI. This represents a methodological and informational improvement. The examples demonstrated the utility of this approach for determining exemplary performance. As an alternative to the quality index, a deficiency index could similarly be derived. Use of a deficiency index could prove beneficial in reducing the chances of over-reacting through the incorporation of prior information into the index. Additionally, Bayesian hierarchical models handle multiplicity automatically. Multiplicity could be defined as lowering the Type I error rate – the probability of identifying units that are "significantly" lower than the overall mean. Note that Austin and Brunner (2007) recommend different probability levels, but for the purposes of this paper we use a 0.95 probability level.

There are several consequences and extensions to our study that are worth noting.

Point 1: sample size calculations

The example data suggests that the fully Bayesian approach can guide us in generating policies for gathering information. Some indicators provide more information than others. The PrU data, collected in one 24-hour period, has relatively lower sample sizes than fall rates, which are collected over all days in a quarter. Our method suggests policy changes such as requesting units to conduct more prevalence studies each quarter rather than just one; but our experience suggests that this will not happen until most facilities in the U.S. have electronic medical records from which performance measures can be extracted. This policy could shorten CrIs and supply units with more precise information. Currently, there are some hospitals in NDNQI that conduct as many as three prevalence studies per quarter and they are implicitly rewarded with more stable indicators that have relatively narrow CrIs.

Point 2: temporal analysis

NDNQI reports provide data for each unit across eight quarters. We can extend our approach to make smooth estimates across time. (A drawback of smoothing is that when there is a meaningful change it is masked.) Let _{jk }be the parameter for the _{j1 }using the approximate Bayesian approaches but then use it as _{j2 }and then continue on such that after the first step, each previous posterior is a prior for the next time point. This type of model is an approximation to a Kalman filter or a state space model (e.g.

Point 3: overall summary of quality

Suppose we want to combine the quality indicators from falls, pressure ulcers, and job enjoyment into one value we call overall quality summary. Suppose, ^{*}0.19 = 0.14 and unit Y it is 0.98^{*}0.89 = 0.87. Indicating that unit Y has evidence of better overall quality than X. However, we may need to incorporate a dependent structure as we may expect various outcome indicators to be correlated because of the quality of nursing care on the unit.

Conclusion

This analysis has demonstrated that approximate Bayesian CrIs will communicate the level of uncertainty of estimates more clearly to decision makers than other significance tests because the large sample sizes in NDNQI reports can lead to very small standard errors. In this context, significant differences from the mean may not be clinically important and the effect of random change in the prevalence of adverse events exaggerated by traditional approaches.

How will users interpret the proposed method? Will they understand CrIs? Will they use the new information? The answers to these questions may not be straightforward. We intend to address them with a small pilot study. The best indicator of success will be when units initiate quality improvements based on accurate interpretation of report card information – rather than on chance fluctuation – after being presented with summaries from an approximate Bayesian approach compared to units who use report cards summarized from traditional approaches. The expectation is that this will occur because units will be less likely to react to chance and more likely to act upon more complete information about their quality of care. Our proposed method has a good statistical foundation and is practical to implement. We think this will be transparent to our users and can be implemented in a spread sheet program like Excel. We show all the 2003 Excel functions needed to implement the approximate Bayesian approach in the appendix.

Appendix: Excel functions for approximate Bayesian approach

1. = average()

2. = stdev()

3. = GAMMAINV(,,,)

4. = BETAINV(,,,)

5. = NORMINV(,,,)

6. = GAMMADIST(,,,)

7. = BETADIST(,,,)

8. = NORMDIST(,,,)

Competing interests

Partial funding for all authors is with a contract from the American Nurses Association (ANA) who fund the National Database of Nursing Quality Indicators (PI: Dunton).

Authors' contributions

BG conceived the study, performed statistical computing/modeling and drafted the manuscript. ND contributed the substantive interpretations. JM contributed statistical modeling ideas. All authors read and approved the final manuscript and contributed to the ideas of the study as well as to editing and re-writing.

Acknowledgements

All authors are funded from a grant, called National Database of Nursing Quality Indicators^{® }(NDNQI^{®}), from the American Nurses Association (ANA). The ANA had no role in the study design, collection, analysis, and interpretation of the data; in writing the manuscript; nor in the decision to submit the manuscript for publication.

Pre-publication history

The pre-publication history for this paper can be accessed here: