Imperial College, London and NIHR CLAHRC for NWL, Floor 4 Lift Bank D, Chelsea and Westminster Hospital, 369 Fulham Road, London, SW10 9NH, UK

Abstract

Background

The XmR chart is a powerful analytical tool in statistical process control (SPC) for detecting special causes of variation in a measure of quality. In this analysis a statistic called the

Methods

We derive the maxima and minima for the average moving range in data without inherent ordering, and show how to calculate this for any data set. We permute a real world data set and calculate control limits based on these extrema.

Results

In the real world data set, permuting the order of the data affected an absolute difference of 109 percent in the width of the control limits.

Discussion

We prove quantitatively that XmR chart analysis is problematic for data without an inherent ordering, and using real-world data, demonstrate the problem this causes for calculating control limits. The resulting ambiguity in the analysis renders it unacceptable as an approach to making decisions based on data without inherent order.

Conclusion

The XmR chart should only be used for data endowed with an inherent ordering, such as a time series. To detect special causes of variation in data without an inherent ordering we suggest that one of the many well-established approaches to outlier analysis should be adopted. Furthermore we recommend that in all SPC analyses authors should consistently report the type of control chart used, including the measure of variation used in calculating control limits.

Background

Statistical process control (SPC) is an approach to quality improvement that has seen increasing use in healthcare since the early 1990s

The most natural application of SPC in healthcare is to time series data - the natural ordering of the data in time is central to the correct application of the analysis. However, in the SPC literature there are conflicting opinions on the usage of control charts for data that does not come endowed with a natural ordering. Some authors recommend the use of control charts for such data

As all measured data exhibit variation, the idea behind a control chart is to provide concrete rules to assess the likely nature of the observed variation. Broadly speaking, the observed variation is classified as either “common cause” variation or “special cause” variation. Common cause variation is the variation exhibited by a process in its usual state, whereas special cause variation is caused by an exceptional or external event. The rules are couched in terms of a number of horizontal lines - the process (or control) limits - marked on a line graph of the data. The calculations of these features depend on which type of chart one is using in the SPC setting, and there are several available. The most commonly used are the individual values and moving range (XmR) charts; p-charts (used to monitor the proportion of faults in a sample); np-charts (an adaptation of the p-chart used to interpret performance in numbers of units rather than proportion); c-charts (to monitor count data - number of faults per unit - or to monitor the total number of events occurring over certain unit of time); and u-charts (for monitoring count data with the sample size greater than one, i.e. the average number of faults per unit). The XmR chart is one of the simplest of the charts to construct, and yet also one of the more robust in general practice, as the other charts rely on the data conforming to an assumed distribution. P- and np-charts rely on the binomial distribution; whilst c- and u-charts rely on the Poisson distribution. The XmR chart makes no such assumption and instead uses the data themselves to provide empirical limits through calculation of an average moving range; whilst, for example, the p- and np-charts assume the variation to be a function of the location and plot theoretical limits that will not hold if the binomial assumption is violated. Technical details of the different types of control chart and the relevant assumptions can be widely found,

In healthcare, more so than in the manufacturing birthplace of SPC, we will seldom be in a position to justify stringent assumptions, such as those of the binomial model, satisfactorily. The simplest control chart, the XmR chart also has the distinct advantage of having the least stringent assumptions attached to it. In fact the only assumption required is that a rational sampling and sub-grouping regime is used

For the XmR analysis of data with a natural ordering, it is important that global measures of dispersion, such as the overall standard deviation, are not used to in the calculation of control limits

Whilst the XmR chart was originally formulated with time-series data in mind, its use has been advocated for data in which there is logical comparability but no inherent ordering of the data, provided the order in which the data is placed is not determined by the data themselves

Methods

There are SPC charts for which permuting the order of the data does not affect the calculation of the control limits. These include the p-charts, np-charts, c-charts and u-charts, but as mentioned above, these charts rely on stringent assumptions that are unlikely to be met by real healthcare data. However, the control limits of the XmR chart are affected by permutations of the data, as is shown below.

Suppose that the data we are interested in, _{1}_{2}_{n}; and then applying the method outlined in

There are at most

The minimum

**Case I:**

It can be shown that, if

and if

Note that if the two central

**Case II:**

It can be shown that

Since _{j} in the above expressions by one.

The distribution of the average moving ranges between the two extrema will necessarily depend on the underlying data, and the closed form distribution is not clear.

Real world example

Taking a real world research example from a quality improvement initiative, running as part of the National Institute of Health Research (NIHR) Collaboration for Leadership in Applied Health Research and Care for Northwest London (CLAHRC NWL) we investigated the consequences of using an XmR chart analysis on data that possess no underlying order, but that are logically comparable. In an improvement initiative aiming to improve ward compliance with hospital trust policy, 23 wards over 4 hospital sites spread across northwest London entered compliance data on a weekly basis to a centralised, multi-user web platform tailored to meet the project requirements

We calculate average overall compliance in each ward for a year period (01-04-2010 to 31-03-2011) to allow comparison of these multiple sites, and then investigate the effect of different orderings of these data items on the average moving range through a resampling without replacement (shuffling) algorithm.

Results and discussion

The summary statistics for the set of ward percentage compliance figures over a year period are displayed in Table

Mean

33.7%

Median

27.9%

Standard Deviation

16.9%

Range

58.3%

Minimum

16.7%

Maximum

75%

Count

23

In order to apply an XmR chart analysis to this data set, the data must be placed in a specific order. The 23 wards can be ordered in

Mean

17.5%

Standard Error

0.0173%

Median

17.7%

Standard Deviation

2.21%

Range

16.1%

Minimum

6.70%

Maximum

22.8%

Count

16383

A histogram of the distribution of

**A histogram of the distribution of****from a resampling without replacement exercise on a real world data set of the ward compliance with hospital trust policy.** This exemplifies the possible variation in the

The formulae from the previous section show that the set of possible values of

Calculating the control limits using

An Average and SD chart of the dataset, with Tukey’s Fences superimposed; as well as control limits based on

**An Average and SD chart of the dataset, with Tukey’s Fences superimposed; as well as control limits based on****and**

Since the XmR chart is not an appropriate way to analyse this data, how should one attempt to distinguish special causes from common causes in this case? Without a natural ordering, the problem becomes one of outlier detection, and an appropriate technique may be selected from the well developed literature on this issue _{1}_{3}_{1}_{3} _{3}_{1}_{1} and _{3} are the lower and upper quartiles of the data. Values of

Conclusions

In conclusion, usage of p-, np-, c- and u-charts for data without natural ordering proceeds precisely as for data endowed with a natural ordering such as time. This is not the case for the simplest but more distribution-robust control chart, the XmR chart. The control limits on an XmR chart are dependent on the ordering of the data, and this dependency is such that the ambiguity in “expected variation” (as quantified by the range of possible widths of the control limits) is large when working with data that have no inherent natural order. We have given a real data set for which this range is almost double the range of the actual data – clearly an unacceptable degree of ambiguity.

Thus when one is faced with a problem of distinguishing special from routine variation in a univariate data set with no time order, the individuals and moving range (XmR) chart is not appropriate, and simply using a random order that is not based on the magnitude of the values, as advocated in primer texts

As such, for identification of potential special causes in a dataset we recommend that:

1) In time series data when there is limited or no knowledge of the distribution of the data, the XmR chart is the appropriate method of analysis, using the

2) In data without a natural order, an appropriate outlier detection method should be selected instead of using XmR - some simple examples being a) Tukey’s method of “fences” b) the 3 sigma rule (note this corresponds to using an average and standard deviation chart). See

3) Authors should explicitly state the method used, including how control limits were calculated.

Abbreviations

CLAHRC NWL, Collaboration for Leadership in Applied Health Research and Care for Northwest London; NIHR, National Institute for Health Research;

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

All authors contributed to the design of the project. AP set up and implemented the resampling algorithms and wrote the first draft of the manuscript. TW determined the mathematical proofs in determination of the maxima for average moving range. All authors contributed to later drafts and gave final approval to the manuscript.

Acknowledgements

Dr Vasa Curcin for co-design and code-writing of the Web Reporting Tool. TW and AP are employed by imperial College, London and work at NIHR CLAHRC for NWL, which funded the implementation project that provided the example data set.

Disclaimer

This article presents independent research commissioned by the National Institute for Health Research (NIHR) under the Collaborations for Leadership in Applied Health Research and Care (CLAHRC) programme for North West London. The views expressed in this publication are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

Pre-publication history

The pre-publication history for this paper can be accessed here: